90477 - Machine Learning Systems For Data Science

Course Unit Page

SDGs

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Industry, innovation and infrastructure

Academic Year 2021/2022

Learning outcomes

By the end of the course, the should - understand the fundamentals of supervised and unsupervised machine learning algorithms, focusing on deep learning algorithms - understand the fundamental programming principles of the Python language and be able to apply them primarily to data management and analysis, under the umbrella of data science - understand the role, purpose and features of Python libraries for numerical computation, data representation, and machine learning, and their interconnectivity with frameworks, such as Jupyter Notebook - be able to apply data science practices and methods to construct models and solve problems for various data-science applications.

Course contents

Module 1

The Python language

══════════════════
Expressions, tuples, lists, comprehensions, sets, dictionaries. Repetitive and branching instructions. The NumPy and Panda packages.


Machine Learning and advanced analytics
══════════════════════════════

Association rule discovery
──────────────────────────────
Classification of association rules
Apriori algorithm

Data clustering
───────────────────
The leader-follower algorithm
The BIRCH algorithm
The K-means algorithm
The EM algorithm


Supervised classification
─────────────────────────────
Classification trees: C4.5, CART
Support Vector Models (SVM)
AdaBoost

 

Deep Neural Networks
─────────────────────────────
Convolutional neural networks, LSTM networks, autoencoders.


Laboratory classes
══════════════════════════

Integrated development environments for Python
Advanced analytics

Module 2

The course focuses on the paradigm and fundamental characteristics of Python, as a programming language, suitable for data manipulation within the data science field. The emphasis is on exploring its libraries, which assist in reading/writing data, in grouping, aggregation, merging and joining data frames, and thus enable data visualizations and its analysis. The practical part of the course involves the use of tools and development platforms, such as Jupyter Notebook and Gitlab, for sharing and supporting data analysis. The course also includes access to various data sets for the purpose of illustrating the applicability of the material in real life examples.

Part 1 – Data management and representation in Python

Techniques and methods for structuring and visualization of data.

Using DataFrame and Series, and running basic statistical analysis.

Applicability and functionality of libraries such as matplotlib, seaborn, and plotly.

Data preparation for statistical analysis.

Part 2 – Machine Learning in Python

Machine Learning (ML) techniques.

Demonstration of supervised and unsupervised ML approaches.

Introduction to libraries, such as scikit-learn, TensorFlow and nltk.

Readings/Bibliography

Module 1

By subject:

Python

Recommended book:

Parker, J. R. (2016). Python: An Introduction to Programming. Mercury Learning & Information. Free to download (using student institutional credentials) E-book, searchable at http://sba.unibo.it | Online resources | E-books | Ricerca un e-book nel Catalogo A-Link

Association rule discovery, data clustering, supervised classification

Optional book:

Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston: Pearson.

Machine learning and neural networks

Recommended book:

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press. Freely available at https://www.deeplearningbook.org/

Module 2

Online materials and other suggested readings will be indicated during the course.

Teaching methods

NOTE: As concerns the teaching methods of this course unit, all students must attend Module 1, 2 on Health and Safety online.

Module 1

The lessons of the course are divided into
• frontal lessons in a lecture room
• lessons in a laboratory, each comprising both frontal expositions and exercises on the techniques for the solution of data analysis problems.

The topics of the course will be divided by lesson type:
• The theoretical and practical notions for advanced analytics are explained in frontal lessons
• In laboratory lessons, students are encouraged to design the generation of advanced analytics and machine learning models using  the Python programming language.

Module 2

  • Theoretical lessons in teaching room
  • Tutorials in lab

During the classes the students will be guided in the implementation and practice of the presented concepts.

If possible, seminar on specific topics of interest will be organized.


Assessment methods

Module 1

The examination is composed of three parts.
Python programming
The student is given a digital text on Esami OnLine, containing the description of a simple analysis problem; the student must produce on Esami OnLine  a Python program solving the analysis problem.

  • Reading books and printed notes is allowed.

Multiple choice test

The student is given a collection of 15 sentences, each of which has 3 possible completions, of which only one is correct. The test is performed entirely on Esami OnLine

  • Reading books and printed notes is allowed.

Oral examination
The student must answer three questions which may concern any part of the contents of the course. In particular, the student must show: Mastery of the theoretical notions of the discipline and of the logic, set theoretic, and mathematical formalism employed in it; knowledge of the elements of the advanced analytics and machine learning techniques which were presented during lessons, and implemented in the tools used during lessons, and the ability to use such tools; knowledge of the Python language.

Computation of the mark of the module and validity of the parts

The marks of all parts are contained in the interval from 0 to 30, including the extremes.

The marks achieved in the Python programming part and the multiple choice test part are valid until the end of the session (June-July or September) in which the part has been taken.

The assessment of the overall outcome of the module and the computation of the final mark of the module take place at the end of the oral examination.

The final mark of the module is computed as the average of the latest marks achieved in the Python programming part, in the multiple choice test part, and the in oral examination.

Module 2

Students will be assessed through two different types of assessments.

Assignment 1 (Group work)

In their group work, which focuses on a programming Project, students will demonstrate their ability to analyses a given data set, using Python libraries, answer questions from given tasks and share the results of the project group on the public online repository, like Gitlab.

Assignment 2 (Individual assessment as oral examination)

In their individual assessment, students will present the results of their work on Assignment 1 (programming Project), in the form of presentation, where each student will debate issues related to the Project and answer relevant questions.


Teaching tools

Module 1

Presentation of the course topics using a overhead projector
Laboratory with desktop PCs equipped with Anaconda; teacher's PC connected to an overhead projector to guide laboratory exercises
Documents used in the presentations, distributed on the site http://virtuale.unibo.it. Access to the documents is allowed only to students of the course.

Module 2

Course notes. Open source projects used as teaching examples.

Office hours

See the website of Stefano Lodi

See the website of Elisabetta Ronchieri