75194 - Data Mining M

Academic Year 2018/2019

  • Moduli: Claudio Sartori (Modulo 1) Federico Ravaldi (Modulo 2)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Computer Engineering (cod. 0937)

Learning outcomes

At the end of the course the students know the principles and the main use cases of the data mining algorithms. The students are able to understand and apply a wide set of analysis algorithms to extract from large datasets useful relationships. The students can also design a process of data selection, transformation, analysis and interpretation to support strategic decisions.

Course contents

MODULE 1 (Data Mining) - Prof. Claudio Sartori

Process of knowledge discovery

  • Definition of objectives
  • Selection of data sources
  • Filtering, reconciliation and data transformation
  • Data mining
  • Validation and presentation of the results

Data Mining techniques

  • Classification with decision trees, neural networks and other algorithms
  • Association rules
  • Clustering/segmentation

Processes and systems

  • Analysis of case studies
  • Examples with commercial data mining systems
  • Architectures of systems with data mining components
  • Standards for data mining components: PMML.

 

MODULE 2 (Big Data Techniques) - Prof. Federico Ravaldi

At the end of the modulethe students know the principles, concepts and the main use cases of Big Data. The students are able to understand and apply new types of methodologies, technologies and architectures, in particular the Hadoop ecosystem and Spark. This is also thanks to the presentation of several real case studies.

(Big) Data Revolution

  • Trend, caratteristiche, opportunità e peculiarità
  • Sorgenti Informative e relativa categorizzazione
  • Abilitatori tecnologici

Business Intelligence

  • Architetture di Data Warehouse
  • Processi ETL
  • Modellazione Multidimensionale (DFM)
  • OLAP

Hadoop & Spark

  • Tecnologie e Architetture per i Big Data
  • HDFS, Yarn e Map Reduce
  • Hadoop Ecosystem
  • Spark
  • Data Lake

NoSQL

  • NoSQL Movement
  • CAP Theorem
  • NoSQL databases

Case Studies

  • Location Intelligence and GeoSpatial Analytics
  • Real Time Analysis, Streaming Data & Kafka
  • Open Data

Readings/Bibliography

Module 1

Tan, Steinbach, Kumar, "Introduction to Data Mining", Addison-Wesley, 2005. ISBN : 0321321367

or

Witten, Frank, Hall, "Data Mining: Practical Machine Learning Tools and Techniques", Morgan-Kaufmann, ISBN: 0123748569 (3rd edition), 2016 ISBN: 0128042915 (4th edition)

The list of the relevant parts of the textbook will be made available at the beginning of the course

The copies of the slides used in the classroom will be made available as an additional reference.

Module 2

Readings:

Rizzi, Golfarelli, "Data Warehouse Design: Modern Principles and Methodologies", 2009. ISBN-10: 0071610391

Nosql Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 2009. Addison-Wesley Professional;

Martin J. Fowler e Pramodkumar J. Sadalage

The copies of the slides used in the classroom will be made available as an additional reference.


Teaching methods

Most course lectures are in "traditional" classrooms and exploit the slides. Case studies are also proposed based on open-source software.

The students can directly arrange with each teacher a Project Activity of Data Mining (4CFU)  based on their own preferences on provided topics.

Assessment methods

The exam evaluation consists of an oral examination.

The student must answer to four questions, two for each module. For each question, the evaluation is the following: fundamental ideas on the topic: 2 points, ability to discuss the technical details, at the level discussed during classes, 0-3 points, ability to apply the concepts to examples 0-2 points.

To participate to the exam, interested students have to register themselves with the usual UniBO Web application, called AlmaEsami.

Teaching tools

In traditional classrooms, the course lectures will make extensive usage of slides. 

For the exercises in classroom the students can bring their laptop, the teacher will provide the information for the installation of the relevant software.

Office hours

See the website of Claudio Sartori

See the website of Federico Ravaldi