40720 - DATA MINING

Anno Accademico 2023/2024

  • Docente: Silvia Emili
  • Crediti formativi: 6
  • SSD: SECS-S/03
  • Lingua di insegnamento: Inglese

Conoscenze e abilità da conseguire

This course will present statistical methods that have proven to be of value in the field of knowledge discovery in business databases, with special attention to techniques that help managers to make intelligent use of data repositories by recognizing patterns and making predictions. In particular, this course enables the student: - to correctly plan a data mining process - to choose the best suited methodology for the problem at hand - to critically interpret the results

Contenuti

  • Part 1: Introduction to descriptive and inferential statistics

    Central tendency and dispersion; frequency tables; sample and population; hypothesis testing and confidence intervals. LAB: STATA tutorial on data visualization, hypothesis testing and confidence intervals.

  • Part 2: Introduction to STATA. LAB: STATA tutorial on organization of data and data preprocessing with real datasets. LAB: STATA tutorial on descriptive and inferential statistic
  • Part 3: Clustering

    Distance metrics and measures; clustering algorithms. LAB: STATA tutorial on classification techniques applied to two real case studies:

  1. industrial clusters;
  2. satisfaction questionnaires.

Testi/Bibliografia

BOOK: Acock, A. C. (2018) A Gentle Introduction to Stata. Sixth edition. Stata Press Publication.

Useful Readings:

BOOK: James, Witten, Hastie and Tibshirani (2013). An Introduction to Statistical Learning, with Applications in R. Springer.

BOOK: Hubert Gatignon (2014). Statistical Analysis of Management Data. Third Edition. Springer.

Jackson, J., and Murphy P. (2006). Clusters in Regional Tourism. An Australian Case. Annals of Tourism Research, 33(4), pp. 1018-1035

Paker, N., and Vural C. A. (2016). Customer segmentation for marinas: Evaluating marinas as destinations. Tourism Management, 56, pp. 156-171

Delgado, M., Porter, M.E. and Stern, S. (2016). Defining clusters of related industries. Journal of Economic Geography, 16, pp. 1–38.

Metodi didattici

Frontal lectures using slides, notes at the board/ipad. Laboratories with Excel and STATA for the software tutorials.

Modalità di verifica e valutazione dell'apprendimento

FOR ATTENDING STUDENTS:

Exam is a combination of:

1) written exam including multiple choices, STATA output to be interpreted and questions on all the topics included in the syllabus, (20 points)

2) three weekly homework by groups, including comprehension of scientific articles and the analysis of (real) datasets (12 points, 4 points for each assignment)

The grading of the written part of the exam is as follows:

< 11: not sufficient (exam failed)

11-12: sufficient

13-14: satisfactory

15-16: good

17-18: very good

19-20: excellent

The grading of each ASSIGNMENT is as follows:

  • 4: the report provided reveals an exhaustive and complete mastery of the topic covered in the course and a consistent execution ability with any minor inaccuracies;
  • 3: the report provided reveals an adequate level of knowledge of the topic covered in the course and a appreciable execution ability with some minor inaccuracies;
  • 2: from the weekly report it emerges an discrete degree of knowledge of the topic covered in the course and a sufficient ability of execution;
  • 1: the work provided for the homework shows significant methodological or theoretical inaccuracies or significant analytical imperfections, which show just sufficient ability of execution;
  • 0: inadequate or wrong.

Laude will be given if the sum of the two parts is equal or greater than 31.

FOR NOT ATTENDING STUDENTS:

Exam is a combination of:

1) written exam including multiple choices, STATA output to be interpreted and questions on all the topics included in the syllabus, (20 points)

2) ORAL exam (12 points).

The grading of the written part of the exam is described above.

The grading of the ORAL is as follows:

  • 10-12: answer correct, exhaustive, complex, which highlights the complete mastery of the topics covered in the course and a consistent execution ability with any minor inaccuracies;
  • 7-9: answer not entirely correct and/or incomplete, from which emerges an appreciable degree of knowledge of the topics covered in the course and a good ability of execution;
  • 6: answer with significant methodological or theoretical inaccuracies or significant analytical imperfections, which show just sufficient ability of execution;
  • 0-5: answer inadequate or wrong.

Laude will be given if the sum of the two parts is equal or greater than 31.

The graduation of the final grade is as follows:

  • 18-19: knowledge of a very limited number of topics covered in the course and analytical skills that emerge only with the help of the teacher, expressed in an overall correct language;
  • 20-24: knowledge of a limited number of topics covered in the course and ability to autonomous analysis only on purely executive matters, expression in correct language;
  • 25-29: good knowledge of a large number of topics covered in the course, ability to make independent choices of critical analysis, mastery of specific terminology;
  • 30-30L: Excellent knowledge of the topics covered in the course, ability to make autonomous choices of critical analysis and connection, full mastery of specific terminology and ability to argue and self-reflection.

Strumenti a supporto della didattica

Software: Excel, STATA

Orario di ricevimento

Consulta il sito web di Silvia Emili