23741 - Statistical Methods for Data Mining

Academic Year 2014/2015

  • Docente: Daniela Giovanna Calò
  • Credits: 10
  • SSD: SECS-S/01
  • Language: Italian
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in STATISTICAL SCIENCES (cod. 8055)

Learning outcomes

This course will present statistical methods that have proven to be of value in the field of knowledge discovery in databases , with special attention to techniques that help  managers to make intelligent use of these repositories by recognizing patterns and making predictions.

In particular, t his course seeks to enable the student:

-          to correctly plan a data mining process

-          to choose the best suited methodology for the problem at hand

-          to critically interpret the results

Course contents

Pre-requisites:

Elements of descriptive and inferential statistics. Elements of probability calculus. Multiple linear regression model.


Course content:

- Intoduction: Data Mining and Statistics

- Data preparation: data discovery, data characterization, descriptive and exploratory statistics.

- Data cleaning: outliers and missing values.

- Variable transformations. Volume and dimension reduction techniques.

- Introduction to statistical learming methods: regression and classification problems. The parametric approach and the nonparametric one. Prediction error estimation methods: apparent error, hold-out method, cross validation techniques.

- Parametric prediction methods: linear models in regression problems; logistic regression.

- Model assessment criteria in regression and classification problems.

- Recursive partitioning methods: CART methodology.

- Artificial neural networks: multilayer perceptrons; regularization techniques.

- Partitive clustering methods; Kohonen maps.

- Association rules.


Application of data mining algorithms using R software is scheduled for each topic introduced by the lecturer. Exercises are based on case studies reproducing the most frequent decision problems encountered in Data Mining activities (credit scoring, target marketing, market basket analysis, ...).

Additional computer laboratory sessions are planned using SAS Enterprise Miner.


Readings/Bibliography

Beyond teaching material provided by the lecturer (and available at http://campus.unibo.it/ ) the following references are recommended as additional readings:

Azzalini A., Scarpa B. (2004). Analisi dei dati e data mining. Springer-Verlag Italia, Milano

Giudici P. (2005). Data mining : metodi informatici, statistici e applicazioni. McGraw-Hill, Milano

Hastie T. Tibshirani R., Friedman J. (2008) The Elements of Statistical Learning. Data Mining, Inference and Prediction, Springer-Verlag, New York, 2008


Teaching methods

The course consists of lectures and computer laboratory activities: lectures deal with methodological issues about the statistical tools listed in the course content, while computer laboratory exercises focus on the application of data mining algorithms on specific case studies.

Since each week a computer laboratory exercise is scheduled, practical exercises take one third (corresponding to 20 hours) of the overall course (corresponding su 60 hours). Their aim is to strangthen the knowledge acquired by students during the lectures, and to develop students' skills in choosing the most adequate methods for a given problem and in interpreting results. 

Assessment methods

Assessment is based on a single final written exam, which lasts 1 hour. It consists of 16 questions: 8 questions deal with theoretical issues and the remaing 8 ones deal with interpreting and commenting the output of a Data Mining analysis carried out using R software. The mark will be expressed in points out of 30, and will result as the sum of the scores corresponding to the questions answered by the student (the maximum mark equals 32).

During the exam, using lecture notes, books or electronic devices is forbidden.

An example of exam questions is avalilable at http://campus.unibo.it/, among the course teaching material uploaded by the lecturer for the a.y. 2012/2013.

Teaching tools

Pc; videoprojector; computer laboratory

Teaching material is available at http://campus.unibo.it/ (download is allowed only to University of Bologna students) 

Office hours

See the website of Daniela Giovanna Calò