- Docente: Matteo Golfarelli
- Credits: 6
- SSD: ING-INF/05
- Language: English
- Moduli: Matteo Golfarelli (Modulo 1) Gianluca Moro (Modulo 2)
- Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
- Campus: Cesena
-
Corso:
Second cycle degree programme (LM) in
Computer Science and Engineering (cod. 8614)
Also valid for Second cycle degree programme (LM) in Digital Transformation Management (cod. 5815)
Second cycle degree programme (LM) in Computer Science and Engineering (cod. 8614)
-
from Sep 23, 2024 to Dec 16, 2024
-
from Sep 19, 2024 to Dec 12, 2024
Course contents
The course is organized on two modules. The first one is shared by both the students from Ingegneria e Scienze Informatiche (ISI), and the strudents from Digital Transformation Management (DTM). The second module is specific to each degree.
------------ Module I: Data Mining (ISI + DTM)
1. Introduction to Data Mining: areas of applicability
2. The knowledge discovery process
- Designing a Data Miing Process
- The CRISP-DM methodology
3. Understanding and preparing data
- Features of different data types
- Statistical data analysis
- Data quality
- Preprocessing: attributes selection and creation
- Measuring similarities and dissimilarities
4. Data mining techniques
Classification through decision trees and bayesian networks
- Association rules
- Clustering
- Outlier detection
5. Data understanding and validation
6. The Weka software [http://www.cs.waikato.ac.nz/ml/weka/]
7. Case studies analysis
------------ Modulo II ISI: Text Mining (Prof. Gianluca Moro)
1. Text Representation, Retrieval, Classification and Opinion Mining
2. Language Models, Transformers, Efficient Attention Mechanisms, Retrieval-Augmented Generation with VectorDB
3. Vision-Language Models, Graph Machine Learning and Neuro-symbolic Methods for Trustworthy AI
4. Generative Large Language Models, Prompt Engineering, Evaluation Methods, Compression and Quantization, Parameter-Efficient Fine-Tuning (e.g. QLoRA, Prompt Tuning) and Large Action Models
5. Labs on real-word case studies - Text Summarization, Information Extraction, Chatbot and Function Call Generation etc. - with Python and open source tools: Hugging Face framework, LangChain, LlamaIndex, Milvus etc.
------------ Modulo II DTM: Machine Learning (Prof. Matteo Francia)
1. Introduction to Neural Network & Deep Learning
2.Theory and application of the CRISP-DM methodology to real datasets and use cases
- Data acquisition and processing
- Preprocessing and Feature Extraction
- Modeling
- Output understanding
Readings/Bibliography
------------ Modulo I: Data Mining (ISI + DTM)
Pang-Ning Tan, Michael Steinbach, Vipin Kumar Introduction to Data Mining. Pearson International, 2006.
------------ Modulo II ISI: Text Mining (Prof. Gianluca Moro)
The teaching material is provided by the instructor on the course’s “virtual” portal, as the field of Natural Language Processing is rapidly evolving. The following bibliographic references are freely accessible and serve as suggested material for certain chapters covered in some lectures:
Gerhard Paaß , Sven Giesselbach. Foundation Models for Natural Language Processing Pre-trained Language Models Integrating Media Book Open Access © 2023 (disponibile gratuitamente)
Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft, 2024)
GitHub repository con codice, dataset modelli e riferimenti allo stato dell’arte
Christopher Manning, Hinrich Schutze, Prabhakar Raghavan. Introduction to Information Retrieval. Cambridge University Press, 2008
Teaching methods
Lessons and practical exercises
Assessment methods
Oral examination and discussion of a project. The project must be decided with one of the two lecturers and can be either the implementation of mining algorithm or the analysis of a dataset using data and text mining techniques.
The goal of the assessment is to verify the cohmprension of the sudied techniques as well as the pratical capability to analyze data and understand and discover the hidden information.
Grades are assigned on the basis of an overall assessment of knowledge, skills, presentation and discussion skills of the topics covered. The ranges of grades correspond can be described as follows:
18-23: the student has sufficient preparation and analytical skills, spread however, over just few topics taught in the course, the overall jargon is correct
24-27: the student shows and adequate preparation at a technical level with some doubts over the topics. Good, yet not to articulate analytical skills with the use of a correct jargon
28-30: Great knowledge about most of the topics taught in the course, good critical and analytical skills, good usage of the specific jargon
30L: excellent and in depth knowledge of all the topics in the course, excellent critical and analytical skills, excellent usage of specific jargon.
Teaching tools
Practical exercises will be carried out using the open source Weka, R and Python (Colab)
Links to further information
http://bias.csr.unibo.it/golfarelli/
Office hours
See the website of Matteo Golfarelli
See the website of Gianluca Moro