- Docente: Claudio Sartori
- Credits: 8
- SSD: ING-INF/05
- Language: English
- Moduli: Claudio Sartori (Modulo 1) Federico Ravaldi (Modulo 2)
- Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Computer Engineering (cod. 0937)
Learning outcomes
At the end of the course the students know the principles and the main use cases of the data mining algorithms. The students are able to understand and apply a wide set of analysis algorithms to extract from large datasets useful relationships. The students can also design a process of data selection, transformation, analysis and interpretation to support strategic decisions.
Course contents
MODULE 1 (Data Mining) - Prof. Claudio Sartori
Process of knowledge discovery
- Definition of objectives
- Selection of data sources
- Filtering, reconciliation and data transformation
- Data mining
- Validation and presentation of the results
Data Mining techniques
- Classification with decision trees, neural networks and other algorithms
- Association rules
- Clustering/segmentation
Processes and systems
- Analysis of case studies
- Examples with commercial data mining systems
- Architectures of systems with data mining components
- Standards for data mining components: PMML.
MODULE 2 (Big Data Techniques) - Prof. Federico Ravaldi
At the end of the modulethe students know the principles, concepts and the main use cases of Big Data. The students are able to understand and apply new types of methodologies, technologies and architectures, in particular the Hadoop ecosystem and Spark. This is also thanks to the presentation of several real case studies.
(Big) Data Revolution
- Trend, caratteristiche, opportunità e peculiarità
- Sorgenti Informative e relativa categorizzazione
- Abilitatori tecnologici
Business Intelligence
- Architetture di Data Warehouse
- Processi ETL
- Modellazione Multidimensionale (DFM)
- OLAP
Hadoop & Spark
- Tecnologie e Architetture per i Big Data
- HDFS, Yarn e Map Reduce
- Hadoop Ecosystem
- Spark
- Data Lake
NoSQL
- NoSQL Movement
- CAP Theorem
- NoSQL databases
Case Studies
- Location Intelligence and GeoSpatial Analytics
- Real Time Analysis, Streaming Data & Kafka
- Open Data
Readings/Bibliography
Module 1
Tan, Steinbach, Kumar, "Introduction to Data Mining", Addison-Wesley, 2005. ISBN : 0321321367
or
Witten, Frank, Hall, "Data Mining: Practical Machine Learning Tools and Techniques", Morgan-Kaufmann, ISBN: 0123748569 (3rd edition), 2016 ISBN: 0128042915 (4th edition)
The list of the relevant parts of the textbook will be made available at the beginning of the course
The copies of the slides used in the classroom will be made available as an additional reference.
Module 2
Readings:
Rizzi, Golfarelli, "Data Warehouse Design: Modern Principles and Methodologies", 2009. ISBN-10: 0071610391
Nosql Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 2009. Addison-Wesley Professional;
Martin J. Fowler e Pramodkumar J. Sadalage
The copies of the slides used in the classroom will be made available as an additional reference.
Teaching methods
Most course lectures are in "traditional" classrooms and exploit the slides. Case studies are also proposed based on open-source software.
The students can directly arrange with each teacher a Project Activity of Data Mining (4CFU) based on their own preferences on provided topics.
Assessment methods
The exam evaluation consists of an oral examination.
The student must answer to four questions, two for each module. For each question, the evaluation is the following: fundamental ideas on the topic: 2 points, ability to discuss the technical details, at the level discussed during classes, 0-3 points, ability to apply the concepts to examples 0-2 points.
To participate to the exam, interested students have to register themselves with the usual UniBO Web application, called AlmaEsami.
Teaching tools
In traditional classrooms, the course lectures will make extensive usage of slides.
For the exercises in classroom the students can bring their laptop, the teacher will provide the information for the installation of the relevant software.
Office hours
See the website of Claudio Sartori
See the website of Federico Ravaldi