- Docente: Stefano Lodi
- Credits: 10
- SSD: ING-INF/05
- Language: English
- Moduli: Stefano Lodi (Modulo 1) Elisabetta Ronchieri (Modulo 2)
- Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
- Campus: Bologna
-
Corso:
Second cycle degree programme (LM) in
Statistical Sciences (cod. 9222)
Also valid for Second cycle degree programme (LM) in Statistics, Economics and Business (cod. 8876)
Second cycle degree programme (LM) in Statistical Sciences (cod. 9222)
Second cycle degree programme (LM) in Statistical Sciences (cod. 9222)
-
from Sep 18, 2023 to Oct 23, 2023
-
from Nov 14, 2023 to Dec 07, 2023
Learning outcomes
By the end of the course, the should - understand the fundamentals of supervised and unsupervised machine learning algorithms, focusing on deep learning algorithms - understand the fundamental programming principles of the Python language and be able to apply them primarily to data management and analysis, under the umbrella of data science - understand the role, purpose and features of Python libraries for numerical computation, data representation, and machine learning, and their interconnectivity with frameworks, such as Jupyter Notebook - be able to apply data science practices and methods to construct models and solve problems for various data-science applications.
Course contents
Module 1
The Python language
══════════════════
Expressions, tuples, lists, comprehensions, sets, dictionaries. Repetitive and branching instructions. The NumPy and Panda packages.
Machine Learning and advanced analytics
══════════════════════════════
Association rule discovery
──────────────────────────────
Apriori algorithm
Data clustering
───────────────────
Algorithms: leader-follower, BIRCH, k-means, EM
Supervised classification
─────────────────────────────
Classification trees C4.5, CART
Support Vector Models (SVM)
AdaBoost
Deep Neural Networks
─────────────────────────────
Dense networks, convolutional neural networks, LSTM networks, autoencoders.
Laboratory classes
══════════════════════════
Integrated development environments for Python
Advanced analytics
Module 2
The course focuses on the paradigm and fundamental characteristics of Python, as a programming language, suitable for data manipulation within the data science field. The emphasis is on exploring its libraries, which assist in reading/writing data, in grouping, aggregation, merging and joining data frames, and thus enable data visualizations and its analysis. The practical part of the course involves the use of tools and development platforms, such as Jupyter Notebook and Gitlab, for sharing and supporting data analysis. The course also includes access to various data sets for the purpose of illustrating the applicability of the material in real life examples.
Part 1 – Data management and representation in Python
Techniques and methods for structuring and visualization of data.
Using DataFrame and Series, and running basic statistical analysis.
Applicability and functionality of libraries such as matplotlib, seaborn, and plotly.
Data preparation for statistical analysis.
Part 2 – Machine Learning in Python
Machine Learning (ML) techniques.
Demonstration of supervised and unsupervised ML approaches.
Introduction to libraries, such as scikit-learn, TensorFlow and nltk.
Readings/Bibliography
Module 1
Course slides and exercises are available on Virtuale.
By subject:
Python
Recommended book:
Parker, J. R. (2016). Python: An Introduction to Programming. Mercury Learning & Information. Free to download (using student institutional credentials) E-book, searchable at http://sba.unibo.it | Online resources | E-books | Ricerca un e-book nel Catalogo A-Link
Association rule discovery, data clustering, supervised classification
Optional book:
Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston: Pearson.
Machine learning and neural networksRecommended book:
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press. Freely available at https://www.deeplearningbook.org/
Module 2
Online materials and other suggested readings will be indicated during the course.
Teaching methods
NOTE: As concerns the teaching methods of this course unit, all students must attend Module 1, 2 on Health and Safety online.
Module 1
The lessons of the course are divided into
• frontal lessons in a lecture room
• lessons in a laboratory, each comprising both frontal expositions and exercises on the techniques for the solution of data analysis problems.
The topics of the course will be divided by lesson type:
• The theoretical and practical notions for advanced analytics and machine learning are explained in frontal lessons
• In laboratory lessons, students implement scripts for advanced analytics and machine learning using the Python programming language.
Module 2
- Theoretical lessons in teaching room
- Tutorials in lab
During the classes the students will be guided in the implementation and practice of the presented concepts.
If possible, seminar on specific topics of interest will be organized.
Assessment methods
Attendance does not contribute to the assessment in any way.
Module 1
The examination is composed of three parts.
Python programming
The student is given a digital text on Esami OnLine, containing the description of a simple analysis problem; the student must produce on Esami OnLine a Python program solving the analysis problem.
- Reading books and bound notes is allowed.
Multiple choice test
The student is given a collection of 15 sentences, each of which has 3 possible completions, of which only one is correct. The test is performed entirely on Esami OnLine
- Reading any material is not allowed.
Oral examination
The student must answer three questions which may concern any part of the contents of the course. In particular, the student must show: mastery of the theoretical notions of the discipline and of the logic, set theoretic, and mathematical formalism employed in it; knowledge of the elements of the advanced analytics and machine learning techniques which were presented during lessons, and implemented in the tools used during lessons, and the ability to use such tools; knowledge of the Python language.
Computation of the grade of the module and validity of the parts
The grades of all parts are contained in the interval from 0 to 30, including the minimum and maximum.
The grades achieved in the Python programming part and the multiple choice test part are valid until the end of the exam period (there are three exam periods: January-February, June-July, and September) in which the part has been taken.
The assessment of the overall outcome of the module and the computation of the final grade of the module take place at the end of the oral examination.
The final grade of the module is computed as the average of the latest grades achieved in the Python programming part, in the multiple choice test part, and the in oral examination.
Module 2
Students will be assessed through three different.
Assignment 1 (Group work)
In their group work, which focuses on a programming project, students will demonstrate their ability to analyse an assigned data set, using Python libraries, answer questions on given tasks and share the results of the project group on a public online repository, like Gitlab.
Assignment 2 (Group work)
The group work of Assignment 2 is similar to the group work of Assignment 1 and complements it: it is a programming project covering the parts of Module 2's program which have not been dealt with in Assignment 1.
Assignment 3 (Individual assessment as oral examination)
In their individual assessment, students will present the results of their work on Assignment 1 and Assignment 2, in the form of presentation, where each student will discuss issues related to the projects and answer relevant questions.
Computation of Module 2's grade and deadlines
- The grade of Module 2 belongs to the interval from 0 to 30, minimum and maximum included.
- The project must be submitted absolutely 3 days before the date of the oral presentation of Assignment 3, at the latest.
Final assessment and grade
The final grade is assigned only if all examination parts of both modules are taken within a single exam period (there are three exam periods: January-February, June-July, and September). Within a single exam period, parts may be taken in different calls.
A part can be taken more than once; note, however, that only the most recent result will be considered for the computation of the final grade.
The final grade is the arithmetic mean of the grades of the modules.
Teaching tools
Module 1
Presentation of the course topics using a overhead projector
Laboratory with desktop PCs equipped with Anaconda; teacher's PC connected to an overhead projector to guide laboratory exercises
Documents used in the presentations, distributed on the site http://virtuale.unibo.it. Access to the documents is allowed only to students of the course.
Module 2
Course notes. Open source projects used as teaching examples.
Office hours
See the website of Stefano Lodi
See the website of Elisabetta Ronchieri
SDGs
This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.