40720 - Data Mining

Academic Year 2025/2026

  • Moduli: (Modulo 1) Maria Elena Bontempi (Modulo 2)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
  • Campus: Forli
  • Corso: Second cycle degree programme (LM) in Business Administration and Sustainability (cod. 6797)

Learning outcomes

This course will present statistical methods that have proven to be of value in the field of knowledge discovery in business databases, with special attention to techniques that help managers to make intelligent use of data repositories by recognizing patterns and making predictions. In particular, this course enables the student: - to correctly plan a data mining process - to choose the best suited methodology for the problem at hand - to critically interpret the results

Course contents

The course is essential for learning how to use the software and tools commonly employed for data analysis and testing your skills with practical applications.

The initial steps to get the class up to speed include

1. Review of the fundamental concepts of descriptive statistics and hypothesis testing. Introduction to Stata programming.

2. Simple linear regression and multiple linear regression: OLS estimation, inference, model comparison, residual analysis, inclusion of categorical variables in the model. Advancement in Stata programming.

Readings/Bibliography

The material (articles, commented notes & slides, programs and data-sets) will be distributed during the lectures and make available on the platform Virtuale.
The reference textbook is:
Wooldridge J.M. 2020 Introductory Econometrics. A Modern Approach, Cengage, 7th Edition, chapters 1, 2, 3, 4, 6, 7, 8.

For an overview of Stata: Baum, C. F. (2006) An Introduction to Modern Econometrics Using Stata, Stata Press. Why programming in Stata? Have a look at Cox N. J. (2001) Speaking Stata: How to repeat yourself without going mad, The Stata Journal, 1, Number 1, pp. 86–97.

Teaching methods

To ensure a smooth transition from theory to practice in econometrics, theoretical lectures are combined with working sessions. During the practical empirical applications, you will use the computer and Stata econometric software (available with a CAMPUS licence and your university credentials).
At the end of the course, you will be able to critically evaluate articles that present basic empirical analyses and to model and estimate your own regression of interest, using the most appropriate methods according to the problem you face.

Assessment methods

Attending students: some homework will be assigned during the course. These ‘exercises’ are intended to reinforce the concepts covered in class, to replicate, using new data, the empirical analyses/guided exercises carried out together in class, to familiarise you with the software (which is a professional tool) and, above all, to understand how to interpret the results, i.e. how to prepare for the individual exam. Students can work alone or in groups of up to 4 participants. Overall, homework accounts for 40% of the final grade.

The remaining 60% of the grade will be individual, through a final exam to be taken in class, after registering on AlmaEsami. The final exam will take place on the EOL platform, where you will find a STATA output with questions about the results, which you will have to answer with your comments and considerations. Your ability to interpret will obviously be strengthened by having done the homework.

Non-attending students will be assessed only on the individual exam and must be prepared on the entire course programme, which can be consulted on Virtuale.

The final grade may be:
30L excellent work!
28-30: independent knowledge and skills leading to good understanding and analytical ability.
24-27: the degree of independent knowledge is appreciable
18-23: rather random answers, lacking structure or logical order, with theoretical and methodological inaccuracies.
<18: incorrect, copied or missing answers

Teaching tools

Theoretical lectures are associated with working sessions; during them you will receive the suggestions needed to run your own empirical analysis. The data-sets and the programming files to perfom applied analyses will be provided during the lectures. The distributed material will be make available on the Virtuale platform. A virtual room on TEAMS will be available in case you cannot physically attend a lecture and to communicate via chat.

Software STATA: go here https://www.unibo.it/secure/software-stata

Office hours

See the website of Maria Elena Bontempi

See the website of

SDGs

Quality education Gender equality Industry, innovation and infrastructure Climate Action

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.