81618 - BIG DATA: NEW TOOLS FOR ECONOMETRICS

Anno Accademico 2017/2018

  • Docente: Sergio Pastorello
  • Crediti formativi: 6
  • SSD: SECS-P/05
  • Lingua di insegnamento: Inglese
  • Modalità didattica: Convenzionale - Lezioni in presenza
  • Campus: Bologna
  • Corso: Laurea Magistrale in Economics (cod. 8408)

    Valido anche per Laurea Magistrale in Matematica (cod. 8208)

Conoscenze e abilità da conseguire

At the end of the course the student has a good understanding of the main tools used by economists and statisticians in machine learning and statistical learning to analyze large/huge dataset coming from several domains. In particular, he/she: - understands and knows how to apply key aspects of machine and statistical learning, such as out-of-sample cross-validation, regularization and scalability. - is familiar with the concepts of supervised and unsupervised learning, classification, regression and clustering analysis as well as the detection of association rules. - understands and can apply the main learning tools such as lasso and ridge regression, regression trees, boosting, bagging and random forests, principal components, mixture models and the k-means algorithm. The course will put special emphasis on the application on training datasets of the techniques that will be discussed using dedicated open-source software packages.

Contenuti

  1. Introduction and Overview of Statistical Learning
  2. Linear Regression as a Prediction Tool
  3. Binary and Multinomial Classification: Logistic Regression, Linear Discriminant Analysis and K-Nearest Neighbors
  4. Resampling Methods: Cross-Validation and the Boostrap
  5. Linear Model Selection and Regularization: Ridge Regression, the LASSO and Principal Components
  6. Moving Beyond Linearity: Regression Splines, Smoothing Splines and Genelar Additive Models
  7. Tree-based Methods: CART, Bagging, Boosting and Random Forests
  8. Support Vector Machines and Neural Networks
  9. Unsupervised Learning: Hierarchical and K-Means Clustering

Testi/Bibliografia

James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning, Springer 2014.

Hastie, Tibshirani and Friedman, The elements of Statistical Learning, Springer 2015.

Metodi didattici

For each topic we will first introduce the relevant theory, and then move as soon as possible to its empirical application in the R language. Special emphasis will be placed on the economic interpretation of the results.

Modalità di verifica e valutazione dell'apprendimento

One problem set (33%), final project plus discussion (67%).

Strumenti a supporto della didattica

We will discuss several empirical analysis and replicate the results of a few papers using the statistical software R.

Orario di ricevimento

Consulta il sito web di Sergio Pastorello