Scheda insegnamento


L'insegnamento contribuisce al perseguimento degli Obiettivi di Sviluppo Sostenibile dell'Agenda 2030 dell'ONU.

Istruzione di qualità Imprese innovazione e infrastrutture

Anno Accademico 2020/2021

Conoscenze e abilità da conseguire

Al termine dell'attività formativa, lo studente ha una comprensione approfondita dei requisiti computazionali di workload dei metodi di machine learning. Lo studente conosce le principali architetture per accelerare tali workload, le principali architetture eterogenee per embedded machine learning, e le principali piattaforme in ambiente cloud per fornire supporto specifico ad applicazioni di machine/deep learning.


Architectures (modules 1+3):

  • Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs): recap and notation
  • Computational characteristics of DNN Training vs Inference
  • Evaluating DNN processors: accuracy, throughput, efficiency, footprint
  • Computational kernels for DNNs: Matrix Multiplication (Toeplitz)
  • Strassen, Winograd and FFT algorithms for DNNs
  • Spatial and Temporal data reuse in DNNs; dataflow taxonomy (brief notes)
  • Accelerating DNNs on GPU: deep dive on NVIDIA Ampere architecture and Tensor Cores
  • Reducing DNN’s memory footprint: data tiling and quantization (brief notes)
  • DNNs on microcontrollers: deep dive on PULP architecture

Platforms (module 2):

  • A brief introduction to parallel programming patterns (embarassingly parallel, decomposition, scatter/gather, scan, reduce, ...)
  • Shared-Memory programming with OpenMP
    • OpenMP programming model
    • The “omp parallel” costruct
    • Scoping costructs
    • Other work-sharing costructs
    • Some examples of applications
  • GPU programming with CUDA
    • CUDA architecture and terminology
    • CUDA programming model
    • CUDA memory hierarchy
    • CUDA/C programming costructs
    • Some examples of applications


Main suggested reading for Architectures module:

Efficient Processing of Deep Neural Networks
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer
Synthesis Lectures on Computer Architecture, June 2020, Vol. 15, No. 2 , Pages 1-341

Main suggested readings for the Platforms module: selected parts from the followins books

An Introduction to Parallel Programming
Peter Pacheco
Morgan Kaufmann, 2011, ISBN 978-0123742605

CUDA C programming guide
NVidia Corporation

Background reading on Deep Learning:

Deep Learning
Ian Goodfellow, Yoshua Bengio and Aaron Courville
MIT Press, 2016

Knowledge of computer architecture and basic programming are mandatory for the course.

Metodi didattici

Frontal lectures + laboratory exercises with own device.

The teaching language of this course is English.

Modalità di verifica dell'apprendimento

Modules 1+3 (Architectures) are jointly evaluated with a written exam followed by an oral discussion. Students can opt to replace the written part of the exam with a time-limited mini-project.

Module 2 (Platforms) will be evaluated with a programming project + written report.

The final exams of the two modules are independent and can be taken in any order. The final grade will be computed as the average of the final evaluations of Modules 1+3 and Module 2, rounded to the nearest integer. Honors (“lode”) will be assigned by the instructors for exceptional work only.

Strumenti a supporto della didattica

Annotated slides and additional teaching materials available online.

All materials will be shared by means of the official Insegnamenti On Line (IOL) site of the course.

Orario di ricevimento

Consulta il sito web di Francesco Conti

Consulta il sito web di Luca Benini

Consulta il sito web di Moreno Marzolla