90352 - MACHINE LEARNING FOR ECONOMISTS

Course Unit Page

Academic Year 2020/2021

Learning outcomes

At the end of the course the student will have a good understanding of the main tools used in machine learning. In particular, he/she: - understands and knows how to apply key aspects of machine and statistical learning, such as out-of-sample cross-validation, regularization and scalability - is familiar with the concepts of supervised learning, regression and classification - understands and can apply the main learning tools such as lasso and ridge regression, regression trees, boosting, bagging and random forests, support vector machines and neural nets. - The course will put special emphasis on empirical applications using the R software.

Course contents

  1. Introduction and Overview of Statistical Learning
  2. Linear Regression as a Prediction Tool
  3. Binary and Multinomial Classification: Logistic Regression, Linear Discriminant Analysis and K-Nearest Neighbors
  4. Resampling Methods: Cross-Validation and the Boostrap
  5. Linear Model Selection and Regularization: Ridge Regression, the LASSO and Principal Components
  6. Moving Beyond Linearity: Regression Splines, Smoothing Splines and Genelar Additive Models
  7. Tree-based Methods: CART, Bagging, Boosting and Random Forests
  8. Support Vector Machines and Neural Networks
  9. Unsupervised Learning: Hierarchical and K-Means Clustering

Readings/Bibliography

James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning, Springer 2014.

Hastie, Tibshirani and Friedman, The elements of Statistical Learning, Springer 2015.

Teaching methods

For each topic we will first introduce the relevant theory, and then move as soon as possible to its empirical application in the R language. Special emphasis will be placed on the economic interpretation of the results.

Assessment methods

Final project plus discussion.

The structure of the final project should be the following:

1. Data description and motivation

- Motivation. State your final objective: outcome, predictors, ...

- Explore the data. Tools: plots, PCA, clustering

2. Setup and assess a prediction model

- Prediction. Setup and assess forecasting models

- Tools. Linear/logistic regression, Discriminant analysis, KNN, PCR, PLS, stepwise selection, regularization methods (Ridge, LASSO, Elastic Net, ...), splines, trees, SVM, ANN, ...

Additional information about the final project:

3. Data

- Use your own data and provide it

- Data must allow exploration and prediction

- Data don’t need to be large!

- Interesting data repositories on the Web: Kaggle, UCI Machine Learning, and others to be provided during classes

4. Tools:

- You should use many tools from class (not all)

- Make sure the data and your goals are compatible

- Work in groups of 4. Let the instructor know the groups’ composition as soon as possible

- The final project must be handed in 5 days before the discussion, including: (i) The data in a format easily readable by R, (ii) An R markdown document that describes the task, reads the data, does the computations and outputs the results; (iii) The pdf file generated by the R markdown document above.

- The file must contain at most 20 pages using the default R markdown options (font size, page margins, etc.)

5. Assessment:

- The final project assessment will take into account the difficulty posed by data cleaning and preparation

- The final project assessment will weight the project assessment and the oral discussion.

6. Grade rejection:

- Students can reject the grade obtained at the exam once. To this end, he/she must email a request to the instructor within the date set for registration. The instructor will confirm reception of the request within the same date.

- Rejection is intended with respect to the whole exam, whose grade is the weighted average of the grades obtained in the homework and the final project. If the grade is rejected, the student must take the full exam (consisting of the final project only).

Teaching tools

We will discuss several empirical analysis and replicate the results of a few papers using the statistical software R and several of its packages.

Office hours

See the website of Sergio Pastorello