- Docente: Sergio Pastorello
- Credits: 6
- SSD: SECS-P/01
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Bologna
-
Corso:
Second cycle degree programme (LM) in
Mathematics (cod. 8208)
Also valid for Second cycle degree programme (LM) in Economics (cod. 8408)
Learning outcomes
At the end of the course the student has a good understanding of the main tools used by economists and statisticians in machine learning and statistical learning to analyze large/huge dataset coming from several domains. In particular, he/she: - understands and knows how to apply key aspects of machine and statistical learning, such as out-of-sample cross-validation, regularization and scalability. - is familiar with the concepts of supervised and unsupervised learning, classification, regression and clustering analysis as well as the detection of association rules. - understands and can apply the main learning tools such as lasso and ridge regression, regression trees, boosting, bagging and random forests, principal components, mixture models and the k-means algorithm. The course will put special emphasis on the application on training datasets of the techniques that will be discussed using dedicated open-source software packages
Course contents
- Introduction and Overview of Statistical Learning
- Linear Regression as a Prediction Tool
- Binary and Multinomial Classification: Logistic Regression, Linear Discriminant Analysis and K-Nearest Neighbors
- Resampling Methods: Cross-Validation and the Boostrap
- Linear Model Selection and Regularization: Ridge Regression, the LASSO and Principal Components
- Moving Beyond Linearity: Regression Splines, Smoothing Splines and Genelar Additive Models
- Tree-based Methods: CART, Bagging, Boosting and Random Forests
- Support Vector Machines and Neural Networks
- Unsupervised Learning: Hierarchical and K-Means Clustering
Readings/Bibliography
James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning, Springer 2014.
Hastie, Tibshirani and Friedman, The elements of Statistical Learning, Springer 2015.
Teaching methods
For each topic we will first introduce the relevant theory, and then move as soon as possible to its empirical application in the R language. Special emphasis will be placed on the economic interpretation of the results.
Assessment methods
One problem set (33%), final project plus discussion (67%).
Teaching tools
We will discuss several empirical analysis and replicate the results of a few papers using the statistical software R.
Office hours
See the website of Sergio Pastorello