95662 - INTRODUCTION TO MACHINE LEARNING

Academic Year 2024/2025

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Quantitative Finance (cod. 8854)

Learning outcomes

The main goal of the course is to present the first elements of Machine Learning accompanied by a brief reference to the most important elements of numerical analysis used in this field. We present also the Python ecosystem for machine learning and the functionality it provides with NumPy, Matplotlib and Pandas, scikit-learn. A general discussion of Supervised and Unsupervised is introduced. After discussing the idea of clustering, the student should learn that this class of algorithms explore input data without being given an explicit output variable. Students should also clearly understand when to use it. After that, we continue with the definition of Supervised Learning describing from a very general point of view how this class of algorithms works and when we should use it. Students should learn how to implement some of the most simple and standard methods for modelling relationship between independent input variables and dependent output variables. As regards decision trees and how they can be used for prediction, the student should learn what are potential advantages of this technique over linear or logistic regression and how to use it in classification problems. A simple introduction to bayesan learning is presented. Finally we explain how different machine learning algorithms can be combined to produce composite predictions. An important example of this is a random forest which is a procedure for generating many different decision trees and combining the results. Students should be familiar with the following concepts: Vector Spaces, Eigenfunctions and Eigenvectors, Operator and Matrix Calculus, Calculus of Extrema, the concept of Gradient, Condition for local and global minima, Conditional Probability, Bayes Rule. A basic experience with Python programming is required.

Course contents

Lesson 1: Introduction to Machine Learning and Financial Data

1. Overview of Machine Learning and its Relevance to Finance
2. Types of Financial Data
3. Basic Statistics and Data Preprocessing Techniques

Practical Session: Data Collection and Cleaning Using Python

1. Introduction to Python Libraries for Financial Data Analysis
2. Data Collection
3. Data Cleaning
4. Data Transformation
5. Data Integration
6. Visualizing Data


Lesson 2: Supervised Learning for Financial and Economic Analysis
1. Fundamentals of Supervised Learning
2. Common Algorithms for Supervised Learning
3. Performance Metrics
4. Model Selection and Evaluation

Practical Session: Building and Evaluating a Regression Model for Real Estate Prediction

1. Introduction to Python Libraries for Machine Learning
2. Data Preparation
3. Feature Selection and Engineering
4. Splitting Data into Training and Testing Sets
5. Building and Training a Linear Regression Model
6. Evaluating the Model
7. Visualizing the Results
8. Improving the Model

Lesson 3: Unsupervised Learning and Clustering in Finance

1. Introduction to Unsupervised Learning
2. Clustering Algorithms
3. Evaluation of Clustering Models
4. Applications of Clustering in Finance

Practical Session: Applying Clustering Algorithms to Financial Data

1. Introduction to Python Libraries for Clustering
2. Data Preparation
3. Implementing k-Means Clustering
4. Implementing Hierarchical Clustering
5. Implementing DBSCAN
6. Implementing Gaussian Mixture Models (GMM)

Lesson 4: Feature Engineering and Dimensionality

1. Importance of Feature Engineering in Financial Models
2. Techniques for Feature Engineering
3. Feature Selection
4. Dimensionality Reduction

Practical Session: Applying Feature Engineering and Dimensionality Reduction Techniques

1. Introduction to Python Libraries for Feature Engineering and Dimensionality Reduction
2. Data Preparation
3. Feature Engineering
4. Feature Selection
5. Dimensionality Reduction
6. Integrating Feature Engineering and Dimensionality Reduction in a Pipeline

Lesson 5: Model Validation and Overfitting

1. Importance of Model Validation
2. Techniques to Prevent Overfitting
3. Hyperparameter Tuning
4. Model Evaluation Metrics

Practical Session: Implementing Model Validation and Preventing Overfitting

1. Introduction to Python Libraries for Model Validation and Overfitting Prevention
2. Data Preparation
3. Splitting Data into Training and Test Sets
4. Implementing Cross-Validation
5. Regularization Techniques
6. Hyperparameter Tuning
7. Model Evaluation
8. Visualizing Model Performance

Readings/Bibliography

-"Machine Learning for Asset Managers" by Marcos López de Prado
"Python for Data Analysis" by Wes McKinney

-"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien Géron
-"Introduction to Statistical Learning" by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani

-Online tutorials and documentation for Pandas, NumPy, and Matplotlib
-Online tutorials and documentation for Scikit-Learn and Pandas

Teaching methods

Lectures and exercises

Assessment methods

The exam will consist of two Python exercises to be solved using a notebook provided by the instructor.

Teaching tools

Slides, Python notebooks

Office hours

See the website of Umberto Cherubini