You are here:

69843 - Data Mining Processes and Techniques

Academic Year 2014/2015

Docente: Gianluca Moro
Credits: 5
SSD: ING-INF/05
Language: Italian

Teaching Mode: In-person learning (entirely or partially)
Campus: Rimini
Corso: Second cycle degree programme (LM) in Statistical, Financial and Actuarial Sciences (cod. 8613)

Learning outcomes

At the end of the course, the student knows the main issues and techniques, at the base of automatic data analysis, for the discovering of new knowledge useful to understand and forecast phenomenon of interests. Moreover, the student learns the knowledge discovery process, which includes the goal definitions, the collection and selection of data, the preparation of observations (i.e. instances), the employment of data mining techniques and algorithms together with methods for the validation of results. In particular the student is able to define a knowledge discovery process in specific enterprise and financial applicative domains, to extract knowledge models by applying appropriate techniques and algorithms in order to resolve a discovery problem, to validate and understand results.

Course contents

Introduction to the knowledge discovery process and data mining techniques both for structured data and unstructured text (e.g. web pages, documents) according to the CRoss Industry Standard Process for Data Mining (CRISP):

definition of goals, collection, comprehension and reconciliation of data in data warehousing (DW)

OLTP and OLAP, Introduction to DW: definition, architecture and design

multi-dimensional data model: facts, measures, dimensions, hierarchies, cuboids

star and snowflake schemas

operations according to the multi-dimensional model: roll-up, drill-down, slice and dice, pivot, data cube

selection and transformation of data into observations

application of data mining techniques (classification with decision trees, associative rules, data clustering) applied also to unstructured text for the processing of web pages and, posts and, in general, documents

validation of results (i.e. efficacy of discovered knowledge models)

deployment and exporting of knowledge models according to standard format such as the Predictive Model Markup Language (PMML)

Case studies developed with the open source tool WEKA and a commercial software:

developing, using Microsoft SQL Server, a data warehouse and performing classification and clustering
predicting, in a financial context, the capability of customers to pay their loans and/or detecting of insurance frauds, predicting the default of companies
exploiting unstructured text variables in the previous analyses in order to better predict or explain the phenomenon of interest
market basket analysis, e.g. discovering combinations of products/services that tends to be bought together

Readings/Bibliography

online chapters 4, 6 and 8 of the book Introduction to Data Mining by Tan, Steinbach, Kumar, Addison-Wesley, 2005. ISBN: 0321321367
lecture notes supplied by the teacher

Teaching methods

Theoretical lectures are followed by exercises in laboratory where students can cope with and resolve problems proposed throughout lessons

Assessment methods

laboratory exercise

Teaching tools

Assisted activities in laboratory
Softwares and computers available not only in laboratory but also remotely via internet: WEKA, SQL Server Analysis Services
Course web site with lectures and laboratory exercises

Office hours

See the website of Gianluca Moro