85302 - Data Science

Academic Year 2025/2026

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: First cycle degree programme (L) in Genomics (cod. 9211)

Learning outcomes

The course provides students with the current methods and techniques of data science using modern computational methods and software with an emphasis on rigorous statistical thinking. At the end of the course students are able to represent and organise knowledge about large-scale data collections, and to turn data into actionable knowledge by using concepts of statistical learning and data mining combined with data visualization techniques and reproducible data analysis.

Course contents

Part I: Introduction to Statistical Learning

Part II: Data Visualization and Reporting

Part III: Supervised Learning

  • Cross-Validation
  • Naïve Bayes
  • Logistic Regression;
  • k-Nearest Neighbors;
  • Nearest Shrunken Centroid;
  • Regression and classification trees;
  • Introduction to the Bootstrap;
  • Bagging; Random Forests; Boosting.

Part IV: Unsupervised Learning

  • k-means
  • Hierarchical clustering
  • Gap Statistic and clustering quality measures

Part V [Optional]: Overview of the main machine learning methods

  • Support Vector Machines

Readings/Bibliography

The primary text for the course:

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to Statistical Learning. Second Edition. New York: Springer. ISBN: 978-1-0716-1417-4. E-book ISBN 978-1-0716-1418-1

    The book is freely available here:
    https://www.statlearning.com/

In addition, we will use:

Teaching methods

Lectures and practical sessions.

Lectures complemented with practical sessions. As concerns the teaching methods of this course unit, all students must attend Module 1, 2 [http://www.unibo.it/en/services-and-opportunities/health-and-assistance/health-and-safety/online-course-on-health-and-safety-in-study-and-internship-areas] on Health and Safety online.


Assessment methods

The learning assessment is composed by a written/practical test. The test is aimed at assessing the student's ability to use the learned definitions, concepts and properties and in solving exercises.

The exam consists of 5-10 questions, both multiple choice and open, some of which to be solved in R. The final grade is out of thirty. Students that, despite having passed the exam, do not feel represented by the obtained result can ask to have an additional (optional) oral exam (within at most 5-7 days) that can change the grade by +/-3 points. Please note: the difficulty of the oral questions will be calibrated based on the written exam grade. Questions will cover theoretical concepts and exercises from the entire syllabus.

During the written exam, students can only use the cheat sheet that is provided on virtuale.unibo.it, containing references to R packages and functions.

Students cannot make use of the textbook, personal notes, artificial intelligence tools nor mobile phones (smart watch or similar electronic data storage or communication devices are not allowed either and must be switched off before taking the exam).

Cheating or the use of unauthorized device is strictly prohibited. Any violation will result in the annulment of the exam and will be reported to the appropriate Division or Campus authorities in accordance with Article 48 of the University’s Code of Ethics.

To take the exam, students must register via the AlmaEsami platform. Students who do not register will not be allowed to take the exam.

Exams can only be taken during official sessions. No exceptions.

 

Students with learning disorders and\or temporary or permanent disabilities: please, contact the office responsible (https://site.unibo.it/studenti-con-disabilita-e-dsa/en/for-students) as soon as possible so that they can propose acceptable adjustments. The request for adaptation must be submitted in advance (15 days before the exam date) to the lecturer, who will assess the appropriateness of the adjustments, taking into account the teaching objectives.


Teaching tools

The following material will be provided: slides of the lectures, exercises with solutions, mock exam.


Office hours

See the website of Laura Anderlucci