40720 - Data Mining

Academic Year 2019/2020

  • Docente: Ida D'Attoma
  • Credits: 6
  • SSD: SECS-S/03
  • Language: English
  • Teaching Mode: Traditional lectures
  • Campus: Forli
  • Corso: Second cycle degree programme (LM) in Economics and management (cod. 9203)

    Also valid for Second cycle degree programme (LM) in Economics and management (cod. 9203)

Learning outcomes

This course will present main statistical methods used in knowledge discovery in business databases; special attention will be paid to techniques that help to single out the relationships of interdependence and patterns in business phenomena. In particular, this course seeks to enable the student: - to correctly plan a data mining process; - to choose the best suited statistical methodology for the problem at hand; - to critically interpret empirical results; - to use these results in the business decision process.

Course contents

  1. Introduction to data mining.

  2. Organization of data: data objects and attributes type, data matrices and their transformations.

  3. Data Preprocessing and Exploratory Analysis: data cleaning, data bivariate exploratory analysis of qualitative and quantitative data.

  4. Proximity measures: distance and similarity.

  5. Hierarchical and Non-Hierarchical Cluster Analysis.

  6. Classification and prediction methods: an introduction to logistic regression. 

Readings/Bibliography

Lectures are based on selected material from the textbook listed below:

  • Tufféry, S. (2011) Data Mining and Statistics for Decision Making. John Wiley & Sons, Ltd.  

You can check its availability at: 

  • http://sol.unibo.it/SebinaOpac/Opac?sysb=&fromBiblio=
  • http://sol.unibo.it/SebinaOpac/Opac
Additional teaching material will be made available to students using the e-learning platform

Teaching methods

Lectures involve the presentation of theoretical and applied issues of the various data mining methods. After each theoretical session a practical tutorial is devoted to applications on real economic data. Applications are discussed and replicated during the computer laboratory session using SAS statistical software. 

Self-evaluation tests will be made in class and on-line (through the e-learning platform).

 

 

Assessment methods

Attending and non attending students will have an oral examination consisting in the following two parts:

1. A team work prepared in the weeks before the exam and discussed during the oral examination. The team work requires you to conduct analysis using specific statistical and DM techniques, and discuss the findings. This will require the use of SAS software.

2. A series of questions relating to theoretical contents.

Check the virtual space for details.

Teaching tools

  • Lecture notes, additional teaching material ,exercises, typical exam questions, SAS software demonstrations on data analysis, self-evaluation on-line tests will be made available through the e-learning platform.
  • SAS 9.4 Software available at TH (Room 5) and at LABIC 
  • Kahoot Software.

Office hours

See the website of Ida D'Attoma