40720 - Data Mining

Academic Year 2022/2023

  • Docente: Silvia Emili
  • Credits: 6
  • SSD: SECS-S/03
  • Language: English
  • Teaching Mode: Traditional lectures
  • Campus: Forli
  • Corso: Second cycle degree programme (LM) in Economics and management (cod. 9203)

    Also valid for Second cycle degree programme (LM) in Economics and management (cod. 9203)

Learning outcomes

This course will present statistical methods that have proven to be of value in the field of knowledge discovery in business databases, with special attention to techniques that help managers to make intelligent use of data repositories by recognizing patterns and making predictions. In particular, this course enables the student: - to correctly plan a data mining process - to choose the best suited methodology for the problem at hand - to critically interpret the results

Course contents

  • Part 1: Introduction to descriptive and inferential statistics

    Central tendency and dispersion; frequency tables; sample and population; hypothesis testing and confidence intervals. LAB: STATA tutorial on data visualization, hypothesis testing and confidence intervals.

  • Part 2: Introduction to STATA. LAB: STATA tutorial on organization of data and data preprocessing with real datasets. LAB: STATA tutorial on descriptive and inferential statistic
  • Part 3: Clustering

    Distance metrics; clustering algorithms. LAB: STATA tutorial on classification techniques applied to two real case studies:

  1. industrial clusters;
  2. satisfaction questionnaires;
  • Part 4: Regression (Preliminary introduction)

    Simple linear regression; multiple linear regression; least squares estimation. Quantitative and qualitative covariates. LAB: STATA tutorial on linear regression with two real case studies.

  1. hedonic regression;
  2. production functions.

Readings/Bibliography

BOOK: Acock, A. C. (2018) A Gentle Introduction to Stata. Sixth edition. Stata Press Publication.

Useful Readings:

BOOK: James, Witten, Hastie and Tibshirani (2013). An Introduction to Statistical Learning, with Applications in R. Springer.

BOOK: Hubert Gatignon (2014). Statistical Analysis of Management Data. Third Edition. Springer.

Fateh M. Mari and Heman D. Lohano (2007) Measuring Production Function and Technical Efficiency of Onion, Tomato, and Chillies Farms in Sindh, Pakistan. The Pakistan Development Review, Vol. 46(4), 1053-1064.

Smith, R. A., McKinney, C. N., Caudill, S., Mixon, F. (2016). Consumer ratings and the pricing of experience goods: hedonic regression analysis of beer prices. Agricultural and Food Economics, 4(1), 1-10.

Jackson, J., and Murphy P. (2006). Clusters in Regional Tourism. An Australian Case. Annals of Tourism Research, 33(4), pp. 1018-1035

Paker, N., and Vural C. A. (2016). Customer segmentation for marinas: Evaluating marinas as destinations. Tourism Management, 56, pp. 156-171

Teaching methods

Frontal lectures using slides, notes at the board/ipad. Laptop when using Excel and STATA for the applied tutorials.

Assessment methods

FOR ATTENDING STUDENTS:

Exam is a combination of:

1) quiz on EOL/Zoom, including multiple choices, R output to be interpreted and questions on all the topics included in the syllabus, (20 points)

2) three weekly homework by groups, including comprehension of scientific articles and the analysis of (real) datasets (12 points, 4 points for each assignment)

The grading of the QUIZ is as follows:

< 11: not sufficient (exam failed)

11-12: sufficient

13-14: satisfactory

15-16: good

17-18: very good

19-20: excellent

The grading of each ASSIGNMENT is as follows:

  • 4: the report provided reveals an exhaustive and complete mastery of the topic covered in the course and a consistent execution ability with any minor inaccuracies;
  • 3: the report provided reveals an adequate level of knowledge of the topic covered in the course and a appreciable execution ability with some minor inaccuracies;
  • 2: from the weekly report it emerges an discrete degree of knowledge of the topic covered in the course and a sufficient ability of execution;
  • 1: the work provided for the homework shows significant methodological or theoretical inaccuracies or significant analytical imperfections, which show just sufficient ability of execution;
  • 0: inadequate or wrong.

Laude will be given if the sum of the two parts is equal or greater than 31.



FOR NOT ATTENDING STUDENTS:

Exam is a combination of:

1) quiz on EOL/Zoom, including multiple choices, R output to be interpreted and questions on all the topics included in the syllabus, (20 points)

2) ORAL exam (12 points).

The grading of the QUIZ is described above.

The grading of the ORAL is as follows:

  • 10-12: answer correct, exhaustive, complex, which highlights the complete mastery of the topics covered in the course and a consistent execution ability with any minor inaccuracies;
  • 7-9: answer not entirely correct and/or incomplete, from which emerges an appreciable degree of knowledge of the topics covered in the course and a good ability of execution;
  • 6: answer with significant methodological or theoretical inaccuracies or significant analytical imperfections, which show just sufficient ability of execution;
  • 0-5: answer inadequate or wrong.

Laude will be given if the sum of the two parts is equal or greater than 31.

The graduation of the final grade is as follows:

  • 18-19: knowledge of a very limited number of topics covered in the course and analytical skills that emerge only with the help of the teacher, expressed in an overall correct language;
  • 20-24: knowledge of a limited number of topics covered in the course and ability to autonomous analysis only on purely executive matters, expression in correct language;
  • 25-29: good knowledge of a large number of topics covered in the course, ability to make independent choices of critical analysis, mastery of specific terminology;
  • 30-30L: Excellent knowledge of the topics covered in the course, ability to make autonomous choices of critical analysis and connection, full mastery of specific terminology and ability to argue and self-reflection.

Teaching tools

Softwares: Excel, STATA

Office hours

See the website of Silvia Emili