95631 - MACHINE LEARNING AND DATA MINING

Academic Year 2023/2024

Learning outcomes

At the end of the course, the student knows and understands: - the principles and the most relevant use cases of a wide set of Machine Learning algorithms used to extract relevant and actionable information from large amounts of data; - the main steps of a Data Mining process, including choosing the Machine Learning methods best suited to the process and evaluating the quality of the result in order to support strategic and operational decisions; - the main concepts related to the management of big amounts of enterprise data, including Data Warehouse and Data Lake. At the end of the course, the student is also able to develop a data mining process for simple datasets.

Course contents

Part 1 - Data Mining

  • Introduction to the Data Mining Process
  • Architectures of systems with data mining components
  • Enterprise Data Warehouse
  • Data Lake
  • Case studies

Part 2 - Machine Learning

  • What is Machine Learning: some history and motivating examples
  • Theory of learning
  • Supervised vs unsupervised learning
  • Classification and regression
  • Model Selection, validation and presentation of results
  • Regression
  • Classification with linear discrimination, decision trees, Bayesian inference, Support Vector Machines, k-nearest neighbors, logistic regression, random forests, adaboost
  • Ensemble learning, boosting, bagging
  • Association rules and the Apriori algorithm
  • Clustering/segmentation with k-means, dbscan, Expectation Maximization, hierarchical methods, kernel methods
  • Analysis of case studies
  • CRISP-DM methodology

Pre-requisites 

  • Fundamentals of programming
  • Fundamentals of calculus and linear algebra
  • Fundamentals of statistics and probabilities
  • Useful some general notion on Data Base Management Systems

Readings/Bibliography

  • Introduction to machine learning / Ethem Alpaydin. - 3. ed. - Cambridge : The MIT Press, 2014. - XXII, 613 p. - link to the online version [https://ebookcentral.proquest.com/lib/unibo/detail.action?docID=3339851]
  • Scikit-Learn [https://scikit-learn.org/stable/documentation.html ], or Python Data Science Handbook [https://jakevdp.github.io/PythonDataScienceHandbook/]
  • Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22.

Teaching methods

Most course lectures are in "traditional" classrooms and exploit the slides. Case studies are also proposed based on open-source software.

Interaction is also stimulated with the use of consultation tools, such as Wooclap

The laboratory activity will be an integral part of the learning process.

Assessment methods

The Verification of knowledge is tested through multiple choice questions. The minimum to pass is to answer correctly half + 1 of the questions. The weight of this part is 33%.

The Verification of abilities will be tested in lab with the development of a program for the execution of a Machine Learning task on an assigned data set. The quality of the solution will be evaluated on the basis of the correctness of the approach, the correctness of the solution, the quality of the coding and of the documentation. The minimum to pass is to give a sensible approach and a reasonable coding. The weight of this part is 67%

It is also possible, on request, to have an oral examination, with possible outcomes between -3="no answer" and +3="correct answer", to be added to the weighted sum of the above-mentioned scores.

Additional details on Assessment are available in the course page on https://virtuale.unibo.it [https://virtuale.unibo.it/]

For the students of Machine Learning and Deep Learning (i.c.) the registration will be executed only after passing the Deep Learning module.


Teaching tools

  • Projection of slides made available before the lectures
  • Wooclap for class interaction
  • https://virtuale.unibo.it [https://virtuale.unibo.it/] for distribution of teaching materials, self-evaluation activities, forums
  • Python [https://www.python.org/]]
  • Jupyter notebooks (Anaconda [https://www.anaconda.com/distribution/] or Google Colab [https://colab.research.google.com/] )
  • This class is supported by DataCamp [https://www.datacamp.com/], a learning platform for data science. DataCamp statement: "Learn R, Python and SQL the way you learn best through a combination of short expert videos and hands-on-the-keyboard exercises. Take over 100+ courses by expert instructors on topics such as importing data, data visualization or machine learning and learn faster through immediate and personalised feedback on every exercise.”

Links to further information

https://virtuale.unibo.it

Office hours

See the website of Claudio Sartori

SDGs

Quality education Industry, innovation and infrastructure

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.