90733 - DATA ANALYTICS

Academic Year 2020/2021

  • Moduli: Marco Di Felice (Modulo 1) Giuseppe Lisanti (Modulo 2)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Computer Science (cod. 8028)

Learning outcomes

At the end of the course, the student: (i) is aware of different types of data-analytics (diagnostic, predictive, prescription, etc) and of the main enabling techniques; (ii) is able to design and implement a full data-pipeline process, from the data acquisition until the data analysis and valorization; (ii) knows the main applications of data analytics, with a special emphasis on industrial and business applications.

Course contents

The course introduces concepts, techniques and tools for the design and implementation of data valorization and analytics processes. To this purpose, the course aims at providing an exhaustive illustration of all the stages of a digital data pipeline, from the data acquisition, pre-processing and knowledge extraction through statistical and Machine Learning techniques, to the data visualization and performance evaluation. In addition, it discusses state-of-the-art applications of the data-analytics on business use-cases and on technological scenarios characterized by high industrial impact, and enabled by the availability of big-data (e.g. IoT and Industry 4.0). After a brief recap of Python programming concepts (including the libraries for the data processing), the course illustrates -in sequence- each stage of the data-pipeline. More in detail, we review the essential techniques of data acquisition (data querying, APIs, Web scraping, etc) and the main architectures of data streaming and pipelining (e.g. AWS DP). At the next stage, we illustrate the pre-processing techniques for data filtering/cleaning, feature selection/transformation, dimensionality reduction. We then present the most common techniques of data visualization, aimed at showing the outcomes but also at supporting the design choices of the analytics process, as well as their relative implementations through Python library. A key component of the data pipeline, and hence of the course, is constituted by the illustration of techniques for automatic knowledge extraction from the datasets; to this purpose, we present in detail the most used techniques of Data Mining/Machine Learning, based on Supervised/Unsupervised approaches, and their implementations through Python frameworks (e.g. Scikit Learn, PyTorch). Finally, we address metrics and methodologies for the performance assessment of the data analytics process. We conclude the course with seminars on relevant applications of the data-analytics in business/industrial use-cases, by envisaging the participation of external companies working on the field. In the following, we list the course contents discussed so far:


  • Recap of Python programming and Python libraries for data science (Pandas, Numpy, etc)

  • Stages of the data-pipeline:

    • Data acquisition: techniques and architectures

    • Data pre-processing: cleaning, transformation, dimensionality reduction, feature extraction, etc

    • Data visualization: markers and channels, separability, graph types

    • Modeling

      • Base concepts (classification vs regression, overfitting vs underfitting, generalization, regularization, etc)

      • Supervised approaches (e.g. KNN, SVM, introduction to neural networks)

      • Unsupervised approaches (e.g. k-Means, Gaussian Mixture model)

    • Performance analysis: metrics, evaluation methods, hyperparameters optimization

  • Data analytics applications in business/industrial use-cases

Readings/Bibliography

All the slides of the course will be made available on the IOL platform. There is no official textbook; based on the topics presented in class, the lecturers will recommend the reading of specific sections of the following books:

  • Cristopher Bishop, Pattern Recognition and Machine Learning, Springer, 2016

  • Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann Series in Data Management Systems

Further readings might be recommended by the lecturers during the course.


Teaching methods

Teaching methods include taught lessons and exercises. The latter will be implemented by (mainly) using the Python language; the lecturers will provide the datasets, the code snippets and solutions on the IOL platform. Moreover, business seminaries will be scheduled in the last week of the course.

Assessment methods

The assessment methods include two exams: a project and an oral exam. The project consists in the design and implementation of a data analytics process, by considering all the stages of the data pipeline presented during the course. The draft project can be either proposed by students (subject to approval by the lecturers) or by the lecturers. The project can be developed by a single student or by a group of maximum 2 units. After the project discussion, there is an oral exam with theoretical questions on the topics presented during the course. The final grade is computed as the average of the two exams. In order to stimulate the participation and the interaction during the taught lessons, we could envisage the assignments of bonus exercises (represented by challenges or individual submissions) which can be completed as facultative homeworks; the submission of the bonus exercises will contribute to the final grade (average of project/oral exams).

Teaching tools

The teaching tools include: slide, personal computer, projector, datasets (made available on the IOL platform).

Office hours

See the website of Marco Di Felice

See the website of Giuseppe Lisanti

SDGs

Industry, innovation and infrastructure Sustainable cities Responsible consumption and production

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.