95780 - Data Science (1) (Lm)

Academic Year 2021/2022

  • Docente: Silvio Peroni
  • Credits: 6
  • SSD: INF/01
  • Language: English
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Digital Humanities and Digital Knowledge (cod. 9224)

Learning outcomes

At the end of the course, the student knows the theoretical and practical groundings for modelling, gathering and managing data using computational techniques. The student can (a) write and share data using standard formats for spreadsheet and Web consumption, (b) understand and create databases through database management systems, (c) retrieve data using appropriate query languages, (d) build and interpret graphs showing basic descriptive statistics computed from data, and (e) develop and integrate data-driven workflows into Python applications.

Course contents

The course is organised in a series of theoretical lectures and hands-on sessions. In each lecture, I provide a theoretical introduction about the specific topic of the lecture. In each hands-on session to be held with a computer, I run a laboratory activity session based on existing tools that enable the experimentation with the topics introduced in the theoretical lectures.

List of lectures and hands-on sessions

  • [Lecture] Introduction to the course and final project specifications
  • [Lecture] What is a datum and how it can be represented computationally
  • [Hands-on] Data formats and methods for storing data in Python
  • [Lecture] Introduction to data modelling
  • [Hands-on] Implementation of data models via Python classes
  • [Lecture] Processing and querying the data
  • [Hands-on] Introduction to Pandas
  • [Lecture] Database Management Systems
  • [Hands-on] Configuring and populating a relational database
  • [Lecture] SQL, a query language for relational databases
  • [Hands-on] Configuring and populating a graph database
  • [Lecture] SPARQL, a query language for RDF databases
  • [Hands-on] Interacting with databases using Pandas
  • [Lecture] Describing and visualising data
  • [Hands-on] Descriptive statistics and graphs about data using Pandas

Readings/Bibliography

Lecture notes will be made freely available to students in the GitHub repository of the course before the beginning of each lecture. Slides and any additional material will be made also available a few days before each lecture in the same repository. No additional books or papers are needed for passing the final exam successfully.

Due to the practical focus of the course, preliminary knowledge and practice on computational thinking (e.g. algorithms, data structures, and algorithmic techniques) and Python is highly recommended.

A minimal bibliography on the two topics mentioned above is:

Teaching methods

Face-to-face classes for 30 hours.

Assessment methods

The exam consists of:

  1. the implementation of a project;
  2. an oral colloquium on the project implemented, for assessing the skills gained by the student.

Students are mandatorily asked to organise themself in groups of 3-4 people for implementing the project. The personal contribution of each member of a group will be assessed during the oral colloquium.

The final evaluation of the student is based on the scores gained for each of the aforementioned points. In particular:

  • excellent evaluation: active involvement in the development of the project following all the theoretical principles and practical guidelines provided to the student during the lectures and the hands-on sessions;
  • sufficient evaluation: providing a minor contribution to the development of the project;
  • insufficient evaluation: not providing any contribution to the project.
Even if discouraged, it is possible to follow the course as non attender. For non attenders, the topic of the project should be discussed with the professor in advance.

Teaching tools

Classes are held in a classroom equipped with personal computers connected to the Intranet and Internet.

Theory lessons will always be accompanied by hands-on sessions. All the material of the course - including lecture notes and slides - will be made available in the GitHub repository of the course. A group in a free messaging application will be set up so as to allow all the students of the course to communicate directly with each other and with the professor.

Links to further information

https://github.com/comp-data/2021-2022/

Office hours

See the website of Silvio Peroni

SDGs

Quality education

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.