95662 - INTRODUCTION TO MACHINE LEARNING

Academic Year 2022/2023

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Quantitative Finance (cod. 8854)

Learning outcomes

The main goal of the course is to present the first elements of Machine Learning accompanied by a brief reference to the most important elements of numerical analysis used in this field. We present also the Python ecosystem for machine learning and the functionality it provides with NumPy, Matplotlib and Pandas, scikit-learn. A general discussion of Supervised and Unsupervised is introduced. After discussing the idea of clustering, the student should learn that this class of algorithms explore input data without being given an explicit output variable. Students should also clearly understand when to use it. After that, we continue with the definition of Supervised Learning describing from a very general point of view how this class of algorithms works and when we should use it. Students should learn how to implement some of the most simple and standard methods for modelling relationship between independent input variables and dependent output variables. As regards decision trees and how they can be used for prediction, the student should learn what are potential advantages of this technique over linear or logistic regression and how to use it in classification problems. A simple introduction to bayesan learning is presented. Finally we explain how different machine learning algorithms can be combined to produce composite predictions. An important example of this is a random forest which is a procedure for generating many different decision trees and combining the results. Students should be familiar with the following concepts: Vector Spaces, Eigenfunctions and Eigenvectors, Operator and Matrix Calculus, Calculus of Extrema, the concept of Gradient, Condition for local and global minima, Conditional Probability, Bayes Rule. A basic experience with Python programming is required.

Course contents

  1. Intoduction to ML
    • 1.1.What is ML: a shift from knowledge to data
    • 1.2.Kind of problems
      • 1.2.1.supervised versus unsupervised
      • 1.2.2.regression vs classification
    • 1.3.Data pipeline
    • 1.4.Python Basics

  2. Data Preprocessing
    • 2.1.Data Normalizzation
    • 2.3.Categorical variables: ordinal and non-ordinal
    • 2.4.Outliers
    • 2.5.Feature Engineering
    • 2.6.Dimensionality reduction:PCA
    • 2.7.Examples in python: sklearn

  3. Linear Regression
    • 3.1.Estimating the coefficients: Least Square Method & maximum likehood
    • 3.2.Performance metrics
    • 3.3.Interpreting the coefficients
    • 3.4.The problem of Collinearity
    • 3.5.Selecting the relevant variables: Lasso/Ridge regression
    • 3.6.Kernel Regression
    • 3.7.Pyhton Hand-on

  4. Logistic Regression
    • 4.1.Problem Definition
    • 4.2.Estimating the coefficients: gradient descent
    • 4.3.Classification Metrics:
      • 4.3.1.Precision
      • 4.3.2.Recall
      • 4.3.3.F-beta score
      • 4.3.4.Area Under tre ROC curve
    • 4.4.Interpreting the coefficients
    • 4.5.Generalized linear model: Poisson regression
    • 4.6.Multilabel case
    • 4.7.Python hands-on

  5. Evaluate a Model
    • 5.1.Cross-validation & hyper parameter tuning
    • 5.2.Bias Variance trade-off
    • 5.3.Simple cross-validation
    • 5.4.N-fold cross-validation
    • 5.5.Python hands-on

  6. Tree Based Method
    • 6.1.Simple Cart for regression and classification
    • 6.2.Ensample methods: Random Forest
    • 6.3.Boosting methods
    • 6.4.Python hands-on

  7. Unsupervised learning
    • 7.1.Problems
    • 7.2.K-means
    • 7.3.Density-Based Model: DBSCAN
    • 7.5.Remove outliers using unsupervised methods
    • 7.6.Python Hands-on

Readings/Bibliography

  • James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.
  • Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: springer, 2009.
  • Rogers, Simon, and Mark Girolami. A first course in machine learning. Chapman and Hall/CRC, 2016.

Assessment methods

The final exam will consists of a Machine learning project. During the exam, the student will have to present the developed project and discuss its main aspects as well as the underlying theory.

Teaching tools

  • Slides (power point/pdf)
  • Selected literature
  • Jupyter Notebook and Python Code

Office hours

See the website of Matteo Amabili