91259 - Architecture and Platforms for Artificial Intelligence

Course Unit Page


This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education Industry, innovation and infrastructure

Academic Year 2020/2021

Learning outcomes

At the end of the course, the student has a deep understanding of the requirements of machine-learning workloads for computing systems, has an understanding of the main architectures for accelerating machine learning workloads and heterogeneous architectures for embedded machine learning, and of the most popular platforms made available by cloud providers to specifically support machine/deep learning applications.

Course contents

Architectures (modules 1+3):

  • Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs): recap and notation
  • Computational characteristics of DNN Training vs Inference
  • Evaluating DNN processors: accuracy, throughput, efficiency, footprint
  • Computational kernels for DNNs: Matrix Multiplication (Toeplitz)
  • Strassen, Winograd and FFT algorithms for DNNs
  • Spatial and Temporal data reuse in DNNs; dataflow taxonomy (brief notes)
  • Accelerating DNNs on GPU: deep dive on NVIDIA Ampere architecture and Tensor Cores
  • Reducing DNN’s memory footprint: data tiling and quantization (brief notes)
  • DNNs on microcontrollers: deep dive on PULP architecture

Platforms (module 2):

  • A brief introduction to parallel programming patterns (embarassingly parallel, decomposition, scatter/gather, scan, reduce, ...)
  • Shared-Memory programming with OpenMP
    • OpenMP programming model
    • The “omp parallel” costruct
    • Scoping costructs
    • Other work-sharing costructs
    • Some examples of applications
  • GPU programming with CUDA
    • CUDA architecture and terminology
    • CUDA programming model
    • CUDA memory hierarchy
    • CUDA/C programming costructs
    • Some examples of applications


Main suggested reading for Architectures module:

Efficient Processing of Deep Neural Networks
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer
Synthesis Lectures on Computer Architecture, June 2020, Vol. 15, No. 2 , Pages 1-341

Main suggested readings for the Platforms module: selected parts from the followins books

An Introduction to Parallel Programming
Peter Pacheco
Morgan Kaufmann, 2011, ISBN 978-0123742605

CUDA C programming guide
NVidia Corporation

Background reading on Deep Learning:

Deep Learning
Ian Goodfellow, Yoshua Bengio and Aaron Courville
MIT Press, 2016

Knowledge of computer architecture and basic programming are mandatory for the course.

Teaching methods

Frontal lectures + laboratory exercises with own device.

The teaching language of this course is English.

Assessment methods

Modules 1+3 (Architectures) are jointly evaluated with a written exam followed by an oral discussion. Students can opt to replace the written part of the exam with a time-limited mini-project.

Module 2 (Platforms) will be evaluated with a programming project + written report.

The final exams of the two modules are independent and can be taken in any order. The final grade will be computed as the average of the final evaluations of Modules 1+3 and Module 2, rounded to the nearest integer. Honors (“lode”) will be assigned by the instructors for exceptional work only.

Teaching tools

Annotated slides and additional teaching materials available online.

All materials will be shared by means of the official Insegnamenti On Line (IOL) site of the course.

Office hours

See the website of Francesco Conti

See the website of Luca Benini

See the website of Moreno Marzolla