- Docente: Umberto Cherubini
- Credits: 3
- SSD: SECS-S/06
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Quantitative Finance (cod. 8854)
-
from Feb 12, 2025 to Mar 12, 2025
Learning outcomes
The main goal of the course is to present the first elements of Machine Learning accompanied by a brief reference to the most important elements of numerical analysis used in this field. We present also the Python ecosystem for machine learning and the functionality it provides with NumPy, Matplotlib and Pandas, scikit-learn. A general discussion of Supervised and Unsupervised is introduced. After discussing the idea of clustering, the student should learn that this class of algorithms explore input data without being given an explicit output variable. Students should also clearly understand when to use it. After that, we continue with the definition of Supervised Learning describing from a very general point of view how this class of algorithms works and when we should use it. Students should learn how to implement some of the most simple and standard methods for modelling relationship between independent input variables and dependent output variables. As regards decision trees and how they can be used for prediction, the student should learn what are potential advantages of this technique over linear or logistic regression and how to use it in classification problems. A simple introduction to bayesan learning is presented. Finally we explain how different machine learning algorithms can be combined to produce composite predictions. An important example of this is a random forest which is a procedure for generating many different decision trees and combining the results. Students should be familiar with the following concepts: Vector Spaces, Eigenfunctions and Eigenvectors, Operator and Matrix Calculus, Calculus of Extrema, the concept of Gradient, Condition for local and global minima, Conditional Probability, Bayes Rule. A basic experience with Python programming is required.
Course contents
Lesson 1: Introduction to Machine Learning and Financial Data
1. Overview of Machine Learning and its Relevance to Finance
2. Types of Financial Data
3. Basic Statistics and Data Preprocessing Techniques
Practical Session: Data Collection and Cleaning Using Python
1. Introduction to Python Libraries for Financial Data Analysis
2. Data Collection
3. Data Cleaning
4. Data Transformation
5. Data Integration
6. Visualizing Data
Lesson 2: Supervised Learning for Financial and Economic Analysis
1. Fundamentals of Supervised Learning
2. Common Algorithms for Supervised Learning
3. Performance Metrics
4. Model Selection and Evaluation
Practical Session: Building and Evaluating a Regression Model for Real Estate Prediction
1. Introduction to Python Libraries for Machine Learning
2. Data Preparation
3. Feature Selection and Engineering
4. Splitting Data into Training and Testing Sets
5. Building and Training a Linear Regression Model
6. Evaluating the Model
7. Visualizing the Results
8. Improving the Model
Lesson 3: Unsupervised Learning and Clustering in Finance
1. Introduction to Unsupervised Learning
2. Clustering Algorithms
3. Evaluation of Clustering Models
4. Applications of Clustering in Finance
Practical Session: Applying Clustering Algorithms to Financial Data
1. Introduction to Python Libraries for Clustering
2. Data Preparation
3. Implementing k-Means Clustering
4. Implementing Hierarchical Clustering
5. Implementing DBSCAN
6. Implementing Gaussian Mixture Models (GMM)
Lesson 4: Feature Engineering and Dimensionality
1. Importance of Feature Engineering in Financial Models
2. Techniques for Feature Engineering
3. Feature Selection
4. Dimensionality Reduction
Practical Session: Applying Feature Engineering and Dimensionality Reduction Techniques
1. Introduction to Python Libraries for Feature Engineering and Dimensionality Reduction
2. Data Preparation
3. Feature Engineering
4. Feature Selection
5. Dimensionality Reduction
6. Integrating Feature Engineering and Dimensionality Reduction in a Pipeline
Lesson 5: Model Validation and Overfitting
1. Importance of Model Validation
2. Techniques to Prevent Overfitting
3. Hyperparameter Tuning
4. Model Evaluation Metrics
Practical Session: Implementing Model Validation and Preventing Overfitting
1. Introduction to Python Libraries for Model Validation and Overfitting Prevention
2. Data Preparation
3. Splitting Data into Training and Test Sets
4. Implementing Cross-Validation
5. Regularization Techniques
6. Hyperparameter Tuning
7. Model Evaluation
8. Visualizing Model Performance
Readings/Bibliography
-"Machine Learning for Asset Managers" by Marcos López de Prado
"Python for Data Analysis" by Wes McKinney
-"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien Géron
-"Introduction to Statistical Learning" by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
-Online tutorials and documentation for Pandas, NumPy, and Matplotlib
-Online tutorials and documentation for Scikit-Learn and Pandas
Teaching methods
Lectures and exercises
Assessment methods
The exam will consist of two Python exercises to be solved using a notebook provided by the instructor.
Teaching tools
Slides, Python notebooks
Office hours
See the website of Umberto Cherubini