You are here:

75194 - Data Mining M

Academic Year 2020/2021

                
                        Docente:
                        Claudio Sartori
                    
                        Credits:
                        8
                    
                        SSD:
                        ING-INF/05
                    
                        Language:
                        English
                    
                        Moduli:
                        
                            Claudio Sartori
                            (Modulo 1)
                        
                            Claudio Sartori
                            (Modulo 2)
                        
                        Teaching Mode:
                        
                                    In-person learning (entirely or partially) (Modulo 1); 
                                
                                    In-person learning (entirely or partially) (Modulo 2)
                                
                            Campus:
                            Bologna
                        
                            Corso:
                            Second cycle degree programme (LM) in
                            Computer Engineering (cod. 0937)

                                Also valid for
                                
                                    Second cycle degree programme (LM) in
                                    
                                        Artificial Intelligence (cod. 9063)
                                    
                                    Second cycle degree programme (LM) in
                                    
                                        Artificial Intelligence (cod. 9063)
                                    
                            Teaching resources on Virtuale

Learning outcomes

At the end of the course the student knows and understands: - the principles and the most relevant use cases of a wide set of Machine Learning algorithms which are used to extract relevant and actionable information from large amounts of data - the main issues in the analysis and manipulation of Big Data - the main frameworks available for Big Data In particular, the student can: - design the main steps of a Data Mining process - choose the Machine Learning methods best suited for the process - evaluate the quality of the result in order to support strategic and operational decisions.

Course contents

Module 1 - Machine Learning (available for 75195 Data Mining M and for 81610 Machine Learning)

What is Machine Learning: some history and motivating examples
Theory of learning
Supervised vs unsupervised learning
Classification and regression
Model Selection, validation and presentation of results
Regression
Classification with linear discrimination, decision trees, Bayesian inference, Support Vector Machines, k-nearest neighbors, logistic regression, random forests, adaboost
Ensemble learning, boosting, bagging
Association rules and the Apriori algorithm
Clustering/segmentation with k-means, dbscan, Expectation Maximization, hierarchical methods, kernel methods
Analysis of case studies

Module 2 - Data Mining (Available for 75195 Data Mining M and for 91262 - Data Mining, Text Mining and Big Data Analytics)

Architectures of systems with data mining components
CRISP-DM methodology
Enterprise Data Warehouse

Readings/Bibliography

Module 1 - Machine Learning

Introduction to machine learning / Ethem Alpaydin. - 3. ed. - Cambridge : The MIT Press, 2014. - XXII, 613 p. - link to the online version [https://ebookcentral.proquest.com/lib/unibo/detail.action?docID=3339851]
Scikit-Learn [https://scikit-learn.org/stable/documentation.html], or Python Data Science Handbook [https://jakevdp.github.io/PythonDataScienceHandbook/]

Module 2 - Data Mining

Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22.

Teaching methods

Most course lectures are in "traditional" classrooms and exploit the slides. Case studies are also proposed based on open-source software.

Interaction is also stimulated with the use of consultation tools, such as Kahoot!

The laboratory activity for module 1 will be an integral part of the learning process.

Assessment methods

Module 1

Understanding of the theoretical and practical notions is tested through multiple choice questions, administered with the IoL system. The minimum to pass is to answer correctly half + 1 of the questions. The weight of this part is 50%.

The practical skills will be tested in lab with the development of a program for the execution of a Machine Learning task on an assigned data set. The quality of the solution will be evaluated on the basis of the correctness of the approach, the correctness of the solution, the quality of the coding and of the documentation. The minimum to pass is to give a sensible approach and a reasonable coding. The weight of this part is 50%.

It is also possible, on request, to have an oral examination, with possible outcomes -1="no answer", 0="some general knowledge", +1="correct answer", and weight 10%.

Module 2

The assessment will be administered with Virtuale and consists of 8 multiple choice questions plus one open question on the topics listed above. The duration is 30 minutes.

Additional details on Assessment are available in the Virtual Learning Environment

Teaching tools

Projection of slides made available before the lectures
Kahoot! [https://kahoot.com/] for class interaction
IoL (Moodle) for distribution of teaching materials, self-evaluation activities, exams
Python [https://www.python.org/]
Jupyter notebooks (Anaconda [https://www.anaconda.com/distribution/] distribution)
“This class is supported by DataCamp [https://www.datacamp.com/] [https://www.datacamp.com/], the most intuitive learning platform for data science. Learn R, Python and SQL the way you learn best through a combination of short expert videos and hands-on-the-keyboard exercises. Take over 100+ courses by expert instructors on topics such as importing data, data visualization or machine learning and learn faster through immediate and personalised feedback on every exercise.”

Office hours

See the website of Claudio Sartori