- Docente: Gianluca Moro
- Credits: 5
- SSD: ING-INF/05
- Language: Italian
- Teaching Mode: In-person learning (entirely or partially)
- Campus: Rimini
- Corso: Second cycle degree programme (LM) in Statistical, Financial and Actuarial Sciences (cod. 8613)
Learning outcomes
At the end of the course, the student knows the main issues and techniques, at the base of automatic data analysis, for the discovering of new knowledge useful to understand and forecast phenomenon of interests. Moreover, the student learns the knowledge discovery process, which includes the goal definitions, the collection and selection of data, the preparation of observations (i.e. instances), the employment of data mining techniques and algorithms together with methods for the validation of results. In particular the student is able to define a knowledge discovery process in specific enterprise and financial applicative domains, to extract knowledge models by applying appropriate techniques and algorithms in order to resolve a discovery problem, to validate and understand results.
Course contents
Introduction to the knowledge discovery process and data mining techniques both for structured data and unstructured text (e.g. web pages, documents) according to the CRoss Industry Standard Process for Data Mining (CRISP):
- definition of goals, collection, comprehension and reconciliation of data in data warehousing (DW)
- OLTP and OLAP, Introduction to DW: definition, architecture and design
- multi-dimensional data model: facts, measures, dimensions, hierarchies, cuboids
- star and snowflake schemas
- operations according to the multi-dimensional model: roll-up, drill-down, slice and dice, pivot, data cube
Case studies developed with the open source tool WEKA and a commercial software:
- developing, using Microsoft SQL Server, a data warehouse and performing classification and clustering
- predicting, in a financial context, the capability of customers to pay their loans and/or detecting of insurance frauds, predicting the default of companies
- exploiting unstructured text variables in the previous analyses
in order to better predict or explain the phenomenon of
interest
- market basket analysis, e.g. discovering combinations of products/services that tends to be bought together
Readings/Bibliography
- online chapters 4, 6 and 8 of the book Introduction to Data Mining by Tan, Steinbach, Kumar, Addison-Wesley, 2005. ISBN: 0321321367
- lecture notes supplied by the teacher
suggested readings:
- The
WEKA manual
- chapters 1, 2 e 11 of the book Pro SQL Server 2008 Analysis Services di Philo Janus, Guy Fouche, 2010, ISBN: 9781430219958
Teaching methods
Theoretical lectures are followed by exercises in laboratory where students can cope with and resolve problems proposed throughout lessons
Assessment methods
laboratory exercise
Teaching tools
- Assisted activities in laboratory
- Softwares and computers available not only in laboratory but also remotely via internet: WEKA, SQL Server Analysis Services
- Course web site with lectures and laboratory exercises
Office hours
See the website of Gianluca Moro