- Docente: Stefania Mignani
- Credits: 10
- SSD: SECS-S/01
- Language: Italian
- Moduli: Stefania Mignani (Modulo 1) Francesca Fortunato (Modulo 2)
- Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Statistics, Economics and Business (cod. 8876)
Learning outcomes
This course will present statistical methods that have proven to be of value in the field of knowledge discovery in databases, with special attention to techniques that help managers to make intelligent use of these repositories by recognizing patterns and making predictions.
In particular, this course seeks to enable the student:
- to correctly plan a data mining process
- to choose the best suited methodology for the problem at hand
- to critically interpret the resultsCourse contents
Pre-requisites:
Elements of descriptive and inferential statistics. Elements of probability calculus. Multiple linear regression model.
Course content:
Part I
- Intoduction: Data Mining and Statistics
- Data preparation: data discovery, data characterization, descriptive and exploratory statistics.
- Data cleaning: outliers and missing values.
- Variable transformations. Volume and dimension reduction techniques.
- Association rules
- Introduction to statistical learming methods: regression and classification problems.
- Parametric prediction methods: linear models in regression problems; logistic regression.
- Recursive partitioning methods and decision tree.
- Model assessment criteria in regression and classification problems (ROC curve and LIFT curve).
Part II
- Nonparametric regression methods: smoothers, Generalized Additive Models. Nonparametric classifiers: knn classifier, Naive Bayes classifier.
- Artificial neural networks: multilayer perceptrons; regularization techniques.
- Aggregation of prediction models.
- Clustering methods: hierarchical and partitioning methods.
Some additional computer laboratory sessions about SAS Enterprise Miner and R are planned.Readings/Bibliography
Beyond teaching material provided by the lecturer (and available at http://campus.unibo.it/ ) the following reference is recommended as additional readings:
Hastie T. Tibshirani R., Friedman J. (2008) The Elements of Statistical Learning. Data Mining, Inference and Prediction, Springer-Verlag, New York, 2008
Andrea Cerioli, Mauro Zani, Analisi dei dati e data mining per le decisioni aziendali. Giuffrè Editore, 2007Teaching methods
The course consists of lectures and computer laboratory activities in SAS and R: lectures deal with methodological issues about the statistical tools listed in the course content, while computer laboratory sessions focus on the application of data mining algorithms on specific case studies.
The laboratory exercise hav the aim to strengthen the knowledge acquired by students during the lectures, and to develop students' skills in choosing the most adequate methods for a given problem and in interpreting results.
Assessment methods
Assessment is based on a single final written exam. It consists of open and multiple choice questions on theoretical aspect and questions requiring to interpret and comment the output of a Data Mining analysis
Teaching tools
Blackboard; PC; videoprojector; computer laboratory
Office hours
See the website of Stefania Mignani
See the website of Francesca Fortunato