81610 - MACHINE LEARNING

• Campus: Bologna
• Corso: First cycle degree programme (L) in Economics and Finance (cod. 8835)
• from Feb 28, 2024 to Mar 13, 2024

Learning outcomes

The course introduces students to some of the most important Machine Learning predictive models such as regularization methods, tree-based methods, Support Vector Machines and Neural Networks, that can potentially contribute to empirical economics. For each topic will we outline its structure, discuss its pros and cons, and focus attention on the issues associated to its empirical application to several economic problems using the software R.

Course contents

Module 2

1. Regularization methods: Ridge regression, the LASSO, Elastic Net
2. Tree-based methods: Bagging, Random Forests, Boosting, Ensemble Methods
3. Support Vector Machines in regression and classification tasks
4. Neural Networks, with applications to numeric and textual data

James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning, Springer 2021 (second edition).

Teaching methods

For each topic we will first introduce the relevant theory, and then move as soon as possible to its empirical application in the R language. Special emphasis will be placed on the economic interpretation and relevance of the results. Attending classes is important especially to learn the empirical topics of the course.

Assessment methods

The exam is joint with Module 1.

The exam tests the ability to apply the methods learnt to simulated or real data, using R, the acquired knowledge of the theoretical concepts and the ability to interpret estimation results in the light of the underlying theory.

The exam consists of a group project plus discussion.

The structure of the final project should be the following:

1. Data description and motivation

- Motivation. State your final objective: outcome, predictors, ...

- Explore the data. Tools: plots, PCA, clustering

2. Setup and assess a prediction model

- Prediction. Setup and assess forecasting models

- Tools. Linear/logistic regression, Discriminant analysis, KNN, PCR, PLS, stepwise selection, regularization methods (Ridge, LASSO, Elastic Net, ...), splines, trees, SVM, NN, ...

3. Data

- Use your own data and provide it (during the course we will illustrate several interesting data repositories on the Web: Kaggle, UCI Machine Learning, and others)

- Data must allow exploration and prediction

- Data don’t need to be huge!

4. Tools:

- You should use the most important tools from class (not all)

- Make sure the data and your goals are compatible

- Work in groups of 3 to 5 students. Let the instructors know the groups’ composition as soon as possible

- The final project must be handed in 5 days before the discussion, including: (i) The data in a format easily readable by R, (ii) The R code that reads the data, does the computations and outputs the results; (iii) The pdf document that illustrates the project.

- The file must contain at most 20 pages including tables and figures.

5. Assessment:

- The final project assessment will take into account the difficulty posed by data cleaning and preparation

- The final project assessment will weight the project assessment and the oral discussion.

- The maximum possible score is 30 cum laude. The grade is graduated as follows:

<18 failed
18-23 sufficient
24-27 good
28-30 very good
30 e lode excellent