75194 - Data Mining M

Academic Year 2019/2020

Learning outcomes

At the end of the course the students know the principles and the main use cases of the data mining algorithms. The students are able to understand and apply a wide set of analysis algorithms to extract from large datasets useful relationships. The students can also design a process of data selection, transformation, analysis and interpretation to support strategic decisions.

Course contents

Module 1 - Machine Learning

  • What is Machine Learning: some history and motivating examples
  • Theory of learning
  • Supervised vs unsupervised learning
  • Classification and regression
  • Model Selection, validation and presentation of results
  • Regression
  • Classification with linear discrimination, decision trees, Bayesian inference, Support Vector Machines, k-nearest neighbors, logistic regression, random forests, adaboost
  • Ensemble learning, boosting, bagging
  • Association rules and the Apriori algorithm
  • Clustering/segmentation with k-means, dbscan, Expectation Maximization, hierarchical methods, kernel methods
  • Analysis of case studies
  • Architectures of systems with data mining components
  • CRISP-DM methodology

Module 2 - Big Data (only for the students of "Data Mining", Master program in Computer Engineering, lessons from 20/09 to 15/10/2019)

(Big) Data Revolution

  • Data big wave and "Big Data hype"
  • Technological enablers
  • Fundamentals of Big Data
  • (Big) Data Types

Big Data: a new paradigm

  • New roles and opportunities
  • Organization models
  • Technologies
  • Methodologies

Big Data architectures

  • NoSQL
  • Hadoop
  • Hadoop Ecosystem
  • Enterprise Data Warehouse
  • Data Lake

 

Readings/Bibliography

Teaching methods

Most course lectures are in "traditional" classrooms and exploit the slides. Case studies are also proposed based on open-source software.

Interaction is also stimulated with the use of consultation tools, such as Kahoot!

The laboratory activity will be an integral part of the learning process.

Assessment methods

Module 1 (for all students)

Understanding of the theoretical and practical notions is tested through multiple choice questions, administered with the IoL system. The minimum to pass is to answer correctly half + 1 of the questions. The weight of this part is 50%.

The practical skills will be tested in lab with the development of a program for the execution of a Machine Learning task on an assigned data set. The quality of the solution will be evaluated on the basis of the correctness of the approach, the correctness of the solution, the quality of the coding and of the documentation. The minimum to pass is to give a sensible approach and a reasonable coding. The weight of this part is 50%. 

It is also possible, on request, to have an oral examination, with possible outcomes -1="no answer", 0="some general knowledge", +1="correct answer", and weight 10%.

Module 2 (only for the MP (LM) Computer Engineering students)

The assessment will be administered with IoL and consists of a 8 multiple choice questions plus one open question on the topics listed above. The duration is 30 minutes. The grade will be combined with that of module 1 with weight 25%

Teaching tools

  • Projection of slides made available before the lectures
  • Kahoot! for class interaction
  • IoL (Moodle) for distribution of teaching materials, self-evaluation activities, exams
  • Python
  • Jupyter notebooks (Anaconda distribution)
  • “This class is supported by DataCamp [https://www.datacamp.com/], the most intuitive learning platform for data science. Learn R, Python and SQL the way you learn best through a combination of short expert videos and hands-on-the-keyboard exercises. Take over 100+ courses by expert instructors on topics such as importing data, data visualization or machine learning and learn faster through immediate and personalised feedback on every exercise.”


Office hours

See the website of Claudio Sartori

SDGs

Quality education

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.