86464 - Algorithms and Systems for Big Data Processing

Course Unit Page

SDGs

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education Industry, innovation and infrastructure

Academic Year 2020/2021

Learning outcomes

The course module explores techniques and hardware implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). This topic is tackled by several points of view, from the basic tools used to process digital data to the characterization of Big-Data processing systems passing through techniques that process the whole dataset in a streaming fashion.

At the end of Module 1, the student has a complete understanding of the basic algorithms used to process large amounts of data and extract useful information, as well as of the software frameworks used for data analytics. In Module 2, the student learns the structure and architecture of computing systems used for processing of data in both large-scale (high-performance and cloud) and small-scale (embedded) contexts, with particular emphasis on GPUs and their applications.

Course contents

Algorithms (module 1 – Prof. Mauro Mangia):

  • Introduction to Python: packages for data processing and visualization
  • Data analytics: linear algebra for machine learning, basic of statistics and signal transformation
  • Low level signal processing: definition of digital filters, advanced filtering, power spectral estimation
  • Basics of machine learning: autoregressive models, dimensionality reduction, clustering, classification problem, base of neural networks, autoencoders
  • Streaming algorithms: basic streaming algorithms for feature extraction, streaming approaches for the PCA/PSA problem.

 Systems (module 2 – Prof. Francesco Conti):

  • Basics of computer architecture: from high-level languages to Instruction Set Architecture; memory hierarchy; in-order and out-of-order processors.
  • Evaluating computers: latency and throughput, memory bandwidth, energy efficiency metrics.
  • Heterogeneous acceleration: concepts of parallel computing; GPU Architecture and basics of CUDA programming.
  • Big Data on large-scale computing systems: warehouse and streaming computing; high-performance and distributed computing (main concepts); practical applications.
  • Big Data on small-scale computers: data processing on embedded platforms; microcontrollers and DSP; SIMD extensions; practical examples from the automotive market.

Readings/Bibliography

Main suggested readings for Algorithms module:

  • Digital Signal Processing: signals, systems and filters
    Andreas Antoniou
    McGraw-Hill (2006)
  • Outlier Analysis
    Charu C. Aggarwal
    Springer (2nd edition, 2017)
  • Deep Learning
    Ian Goodfellow, Yoshua Bengio, Aaron Courville
    MIT Press (2016)

Main suggested readings for Systems module:

  • Computer Organization and Design RISC-V Edition: The Hardware Software Interface
    David A. Patterson, John L. Hennessy
    Morgan Kaufmann (2017)
  • Programming Massively Parallel Processors: A Hands-on Approach
    David B. Kirk, Wen-mei W. Hwu
    Morgan Kaufmann (3rd edition, 2016)

Basic programming skills are required for the course.

Teaching methods

Frontal lectures + laboratory exercises with own laptop.

The teaching language of this course is English

Assessment methods

Learning is assessed by means of a joint oral exam for the two modules on the topics discussed during the frontal lectures. Students can opt to perform a mini-project to be discussed as part of the oral exam.

Teaching tools

Annotated slides and additional teaching materials available online.

All materials will be shared by means of the official Insegnamenti On Line (IOL) site of the course.

Office hours

See the website of Francesco Conti

See the website of Mauro Mangia