85302 - Data Science

Academic Year 2020/2021

  • Moduli: Laura Anderlucci (Modulo 1) Laura Anderlucci (Modulo 2)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
  • Campus: Bologna
  • Corso: First cycle degree programme (L) in Genomics (cod. 9211)

Learning outcomes

The course provides students with the current methods and techniques of data science using modern computational methods and software with an emphasis on rigorous statistical thinking. At the end of the course students are able to represent and organise knowledge about large-scale data collections, and to turn data into actionable knowledge by using concepts of statistical learning and data mining combined with data visualization techniques and reproducible data analysis.

Course contents

Part 0: Introduction to Statistical Learning

Part I: Classification

  • Naïve Bayes
  • Logistic Regression;
  • Linear Discriminant Analysis
  • k-Nearest Neighbors

Part II: Resampling Methods

  • Cross-Validation
  • The Bootstrap

Part III: Tree-Based Methods

  • Classification trees
  • Bagging; Random Forests; Boosting

Part IV: Unsupervised Learning

  • k-means
  • Hierarchical clustering

Part V: Overview of the main machine learning methods

  • Support Vector Machines
  • Neural Networks

Readings/Bibliography

The primary text for the course:

 

In addition, we will use:

Teaching methods

Lectures and practical sessions.

Assessment methods

Written exam.

Teaching tools

The following material will be provided: slides of the lectures, exercises with solutions, mock exam.

Office hours

See the website of Laura Anderlucci