90733 - Data Analytics

Academic Year 2025/2026

  • Moduli: Marco Di Felice (Modulo 1) Giuseppe Lisanti (Modulo 2)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Computer Science (cod. 5898)

Learning outcomes

At the end of the course, the student: (i) is aware of different types of data-analytics (diagnostic, predictive, prescription, etc) and of the main enabling techniques; (ii) is able to design and implement a full data-pipeline process, from the data acquisition until the data analysis and valorization; (ii) knows the main applications of data analytics, with a special emphasis on industrial and business applications.

Course contents

The course introduces concepts, techniques and tools for the design and implementation of data valorization and analytics processes. To this purpose, the course aims at providing an exhaustive illustration of all the stages of a digital data pipeline, from the data acquisition, pre-processing and knowledge extraction through statistical and Machine Learning techniques, to the data visualization and performance evaluation. In addition, it discusses state-of-the-art applications of the data-analytics on business use-cases and on technological scenarios characterized by high industrial impact, and enabled by the availability of big-data (e.g. IoT and Industry 4.0). After a brief recap of Python programming concepts (including the libraries for the data processing), the course illustrates -in sequence- each stage of the data-pipeline. More in detail, we review the essential techniques of data acquisition (data querying, APIs, Web scraping, etc) and the main architectures of data streaming and pipelining (e.g. AWS DP). At the next stage, we illustrate the pre-processing techniques for data filtering/cleaning, feature selection/transformation, dimensionality reduction. We then present the most common techniques of data visualization, aimed at showing the outcomes but also at supporting the design choices of the analytics process, as well as their relative implementations through Python library. A key component of the data pipeline, and hence of the course, is constituted by the illustration of techniques for automatic knowledge extraction from the datasets; to this purpose, we present in detail the most used techniques of Machine & Deep Learning, based on Supervised/Unsupervised approaches, and their implementations through Python frameworks (e.g. Scikit Learn, PyTorch). Finally, we address metrics and methodologies for the performance assessment of the data analytics process. We conclude the course with seminars on relevant applications of the data-analytics in business/industrial use-cases, by envisaging the participation of external companies working on the field. In the following, we list the course contents discussed so far:


  • Recap of Python programming and Python libraries for data science (Pandas, Numpy, etc)

  • Stages of the data-pipeline:

    • Data acquisition: techniques and architectures

    • Data pre-processing: cleaning, transformation, dimensionality reduction, feature extraction, etc

    • Data visualization: markers and channels, separability, graph types

    • Modeling

      • Base concepts (classification vs regression, overfitting vs underfitting, generalization, regularization, etc)

      • Supervised approaches (e.g. KNN, SVM, introduction to neural networks)

      • Unsupervised approaches (e.g. k-Means, Gaussian Mixture model)

    • Performance analysis: metrics, evaluation methods, hyperparameters optimization

  • Data analytics applications in business/industrial use-cases

Readings/Bibliography

All course slides are made available on the Virtuale platform.
There is no single required textbook; for specific sections of the course (indicated by the instructors as needed), the following readings are recommended:

  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013

  • Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2016

  • Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann Series in Data Management Systems

Additional readings may be suggested by the instructors throughout the course, depending on the topics covered.

Teaching methods

Teaching methods include taught lessons and exercises. The latter will be implemented by (mainly) using the Python language; the lecturers will provide the datasets, the code snippets and solutions on the Virtuale platform. Moreover, business seminaries will be scheduled in the last week of the course.

Assessment methods

Student assessment consists of two components: a project and an oral exam.


The project involves the design and implementation of a data analytics pipeline, covering all phases presented during the course and using the tools introduced in class (PyTorch + Scikit-Learn).
The project topic may be chosen by students (subject to approval) or proposed by the instructor during the course.
The project can be completed individually or in groups of up to 2 students.

Following the project discussion, students take an oral exam with theoretical questions on the topics covered during the course.
The final grade is the average of the grades obtained in the project and the oral exam.
The oral exam and project discussion take place on the same date, published via the ALMAESAMI portal.
Students must register for the exam session through the same portal, and must submit the project source code and report via the Virtuale platform at least 3 days before the scheduled exam/discussion date.

To encourage participation and interaction during the semester, optional bonus exercises may be assigned in the form of small challenges or individual take-home assignments.
The submission of these exercises can contribute to the final grade (average of written/project grades).

Teaching tools

The teaching tools include: slide, personal computer, projector, datasets (made available on the Virtuale platform).

Office hours

See the website of Marco Di Felice

See the website of Giuseppe Lisanti

SDGs

Quality education Industry, innovation and infrastructure

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.