40720 - DATA MINING

Anno Accademico 2021/2022

  • Docente: Silvia Emili
  • Crediti formativi: 6
  • SSD: SECS-S/03
  • Lingua di insegnamento: Inglese
  • Modalità didattica: Convenzionale - Lezioni in presenza
  • Campus: Forli
  • Corso: Laurea Magistrale in Economia e management (cod. 9203)

    Valido anche per Laurea Magistrale in Economia e management (cod. 9203)

Conoscenze e abilità da conseguire

This course will present main statistical methods used in knowledge discovery in business databases; special attention will be paid to techniques that help to single out the relationships of interdependence and patterns in business phenomena. In particular, this course seeks to enable the student: - to correctly plan a data mining process; - to choose the best suited statistical methodology for the problem at hand; - to critically interpret empirical results; - to use these results in the business decision process.

Contenuti

  • Part 1: Introduction to descriptive and inferential statistics

    Central tendency and dispersion; frequency tables; sample and population; hypothesis testing and confidence intervals. LAB: STATA tutorial on data visualization, hypothesis testing and confidence intervals.

  • Part 2: Introduction to STATA. LAB: STATA tutorial on organization of data and data preprocessing with real datasets. LAB: STATA tutorial on descriptive and inferential statistic
  • Part 3: Clustering

    Distance metrics; clustering algorithms. LAB: STATA tutorial on classification techniques applied to two real case studies:

  1. industrial clusters;
  2. satisfaction questionnaires;
  • Part 4: Regression (Preliminary introduction)

    Simple linear regression; multiple linear regression; least squares estimation. LAB: STATA tutorial on linear regression with two real case studies.

  1. hedonic regression;
  2. production functions.

Testi/Bibliografia

BOOK: James, Witten, Hastie and Tibshirani (2013). An Introduction to Statistical Learning, with Applications in R. Springer.

Useful Readings:

BOOK: Hubert Gatignon (2014). Statistical Analysis of Management Data. Third Edition. Springer.

Fateh M. Mari and Heman D. Lohano (2007) Measuring Production Function and Technical Efficiency of Onion, Tomato, and Chillies Farms in Sindh, Pakistan. The Pakistan Development Review, Vol. 46(4), 1053-1064.

Smith, R. A., McKinney, C. N., Caudill, S., Mixon, F. (2016). Consumer ratings and the pricing of experience goods: hedonic regression analysis of beer prices. Agricultural and Food Economics, 4(1), 1-10.

Jackson, J., and Murphy P. (2006). Clusters in Regional Tourism. An Australian Case. Annals of Tourism Research, 33(4), pp. 1018-1035

Paker, N., and Vural C. A. (2016). Customer segmentation for marinas: Evaluating marinas as destinations. Tourism Management, 56, pp. 156-171

Metodi didattici

Frontal lectures using slides, notes at the board/ipad. Laptop when using Excel and STATA for the applied tutorials.

Modalità di verifica e valutazione dell'apprendimento

FOR ATTENDING STUDENTS:

Exam is a combination of:

1) quiz on EOL/Zoom, including multiple choices, STATA output to be interpreted and questions on all the topics included in the syllabus, (20 points)

2) four weekly works by groups (during the lectures), including comprehension of scientific articles and the analysis of (real) datasets (12 points, 3 points for each assignment)

The grading of the QUIZ is as follows:

< 11: not sufficient (exam failed)

11-12: sufficient

13-14: satisfactory

15-16: good

17-18: very good

19-20: excellent

The grading of each ASSIGNMENT is as follows:

  • 3: the report provided reveals an exhaustive and complete mastery of the topic covered in the course and a consistent execution ability with any minor inaccuracies;
  • 2: from the weekly report it emerges an appreciable degree of knowledge of the topic covered in the course and a good ability of execution;
  • 1: the work provided for the homework shows significant methodological or theoretical inaccuracies or significant analytical imperfections, which show just sufficient ability of execution;
  • 0: inadequate or wrong.

Laude will be given if the sum of the two parts is equal or greater than 31.

 

FOR NOT ATTENDING STUDENTS:

Exam is a combination of:

1) quiz on EOL/Zoom, including multiple choices, STATA output to be interpreted and questions on all the topics included in the syllabus, (20 points)

2) ORAL exam (12 points).

The grading of the QUIZ is described above.

The grading of the ORAL is as follows:

  • 10-12: answer correct, exhaustive, complex, which highlights the complete mastery of the topics covered in the course and a consistent execution ability with any minor inaccuracies;
  • 7-9: answer not entirely correct and/or incomplete, from which emerges an appreciable degree of knowledge of the topics covered in the course and a good ability of execution;
  • 6: answer with significant methodological or theoretical inaccuracies or significant analytical imperfections, which show just sufficient ability of execution;
  • 0-5: answer inadequate or wrong.

Laude will be given if the sum of the two parts is equal or greater than 31.

The graduation of the final grade is as follows:

  • 18-19: knowledge of a very limited number of topics covered in the course and analytical skills that emerge only with the help of the teacher, expressed in an overall correct language;
  • 20-24: knowledge of a limited number of topics covered in the course and ability to autonomous analysis only on purely executive matters, expression in correct language;
  • 25-29: good knowledge of a large number of topics covered in the course, ability to make independent choices of critical analysis, mastery of specific terminology;
  • 30-30L: Excellent knowledge of the topics covered in the course, ability to make autonomous choices of critical analysis and connection, full mastery of specific terminology and ability to argue and self-reflection.

Strumenti a supporto della didattica

Softwares: Excel, STATA

Orario di ricevimento

Consulta il sito web di Silvia Emili