81618 - Big Data: New Tools for Econometrics

Academic Year 2016/2017

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Economics (cod. 8408)

Learning outcomes

At the end of the course the student has a good understanding of the main tools used by economists and statisticians in machine learning and statistical learning to analyze large/huge dataset coming from several domains. In particular, he/she: - understands and knows how to apply key aspects of machine and statistical learning, such as out-of-sample cross-validation, regularization and scalability. - is familiar with the concepts of supervised and unsupervised learning, classification, regression and clustering analysis as well as the detection of association rules. - understands and can apply the main learning tools such as lasso and ridge regression, regression trees, boosting, bagging and random forests, principal components, mixture models and the k-means algorithm. The course will put special emphasis on the application on training datasets of the techniques that will be discussed using dedicated open-source software packages

Course contents

  1. Introduction and Overview of Statistical Learning
  2. Linear Regression as a Prediction Tool
  3. Binary and Multinomial Classification: Logistic Regression, Linear Discriminant Analysis and K-Nearest Neighbors
  4. Resampling Methods: Cross-Validation and the Boostrap
  5. Linear Model Selection and Regularization: Ridge Regression, the LASSO and Principal Components
  6. Moving Beyond Linearity: Regression Splines, Smoothing Splines and Genelar Additive Models
  7. Tree-based Methods: CART, Bagging, Boosting and Random Forests
  8. Support Vector Machines and Neural Networks
  9. Unsupervised Learning: Hierarchical and K-Means Clustering

Readings/Bibliography

James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning, Springer 2014.

Hastie, Tibshirani and Friedman, The elements of Statistical Learning, Springer 2015.

Teaching methods

For each topic we will first introduce the relevant theory, and then move as soon as possible to its empirical application in the R language. Special emphasis will be placed on the economic interpretation of the results.

Assessment methods

The final exam is written. It lasts one hour and it is composed of two distinct sections.
The first one is mainly theoretical, and it contains 5 multiple choice questions. The second one is mainly empirical, and it contains 11 questions whose answers shoud be computed using Stata and knowledge of the empirical analysis discussed during classes. Whatever the section, each correct answer yields two points; no penalty is applied to wrong answers. The final mark is the total number of point obtained in the two sections.
During the exam it is forbidden to consult notes, slides, books, pocket calculators and any other electronic devices. The purpose of the exam is to ascertain that students acquired the knowledge required to correctly specify, estimate and test the econometric models discussed during the lectures and possess the ability to properly interpret the results provided by these procedures.

Teaching tools

We will discuss several empirical analysis and replicate the results of a few papers using the statistical software R.

Office hours

See the website of Sergio Pastorello