- Docente: Luca Benini
- Credits: 6
- SSD: INF/01
- Language: English
- Moduli: Luca Benini (Modulo 1) Gianluigi Zavattaro (Modulo 2)
- Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
- Campus: Bologna
-
Corso:
Second cycle degree programme (LM) in
Artificial Intelligence (cod. 9063)
Also valid for Second cycle degree programme (LM) in Electronic Engineering (cod. 0934)
Learning outcomes
At the end of the course, the student has a deep understanding of the requirements of machine-learning workloads for computing systems, has an understanding of the main architectures for accelerating machine learning workloads and heterogeneous architectures for embedded machine learning, and of the most popular platforms made available by cloud providers to specifically support machine/deep learning applications.
Course contents
Module 1 (for students of 93398 and 91259, by Prof. L. Benini)
- From ML to DNNs - a computational perspective
- Introduction to key computational kernels (dot-product, matrix multiply...)
- Inference vs training - workload analysis characterization
- The NN computational zoo: DNNs, CNNs, RNNs, GNNs, Attention-based Networks
- Running ML workloads on programmable processors
- recap of processor instruction set architecture (ISA) with focus on data processing
- improving processor ISAs for ML: RISC-V and ARM use cases
- fundamentals of parallel processor architecture and parallelization of ML workloads
- Algorithmic optimizations for ML
- Key bottlenecks taxonomy of optimization techniques
- Algorithmic techniques: Strassen, Winograd, FFT
- Topology optimization: efficient NN models - depthwise convolutions, inverse bottleneck, introduction to Neural Architectural Search
Module 2 (for students of 93398, by Prof. F. Conti)
- Representing data in Deep Neural Networks
- Recap of canonical DNN loops – a tensor-centric view
- Data quantization in Deep Neural Networks
- Brief notes on data pruning
- From training to software-based deployment
- High-performance embedded systems (NVIDIA Xavier, Huawei Ascend)
- Microcontroller-based systems (STM32)
- From software to hardware acceleration
- Principles of DNN acceleration: spatial and temporal data reuse; dataflow loop nests and taxonomy; data tiling
- The Neural Engine zoo: convolvers, matrix product accelerators, systolic arrays – examples from the state-of-the-art
Module 2 (for students of 91259, by Prof. G. Zavattaro)
Introduction to parallel programming.
Parallel programming patterns: embarassingly parallel, decomposition, master/worker, scan, reduce, ...
Shared-Memory programming with OpenMP.
OpenMP programming model: the “omp parallel” costruct, scoping costructs, other work-sharing costructs.
Some examples of applications.
Readings/Bibliography
Refer to virtuale
Teaching methods
Frontal Lectures for theory. In addition, both Module 1 and Module 2 will include hands-on sessions requiring a student laptop.
Assessment methods
Written exam with oral discussion
Teaching tools
Refer to Virtuale
Office hours
See the website of Luca Benini
See the website of Gianluigi Zavattaro