91266 - Machine Learning for Computer Vision

Course Unit Page


This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education

Academic Year 2021/2022

Learning outcomes

At the end of the course, the student masters the most popular modern machine-learning approaches to computer-vision tasks, with particular reference to specialized deep-learning architectures. The student has both a theoretical understanding and the necessary practical skills required to develop state-of-the-art image and video analysis systems for real-world applications.

Course contents

The course is held in the first semester, from September to December.


  1. Image classification/recognition: limits of hand-crafted methods; brief review of machine learning and data-driven methods; bag-of-words models applied to dense descriptors; introduction to Neural Networks (NNs) and Convolutional NNs; AlexNet and the deep learning revolution. Advanced and state-of-the-art architectures for image classification. Hands-on sessions on image classification: training from scratch and transfer learning.
  2. Intro to PyTorch and practical aspects of model training. Discussion of notebooks provided by the instructor to get to know PyTorch and learn how to train CNNs for image classification; discussion on the impact of several design and optimization choices.
  3. Object detection. Introduction to ensemble learning via boosting. The Viola-Jones detector and its applications. Specialized NN architectures for object detection. Two-stages, one-stage, and anchor-free detectors. RoI Pooling operator, Feature Pyramid Networks. Imbalanced learning and the focal loss. Hands-on session on training a state-of-the-art network on a small dataset.
  4. Dense prediction problems: semantic/instance segmentation and depth from mono/stereo. Ensemble learning via bagging and random forests. The algorithm behind the Kinect body part segmentation. Fully Convolutional Networks. Transposed and dilated convolutions. RoI Align operator. Specialized NN architectures for semantic, instance, and panoptic segmentation. Deep networks for depth estimation: DispNet, GCNet, Monodepth.
  5. Metric learning. Deep metric learning and its applications to face recognition/identification and beyond. Locally connected layers. Contrastive and triplet loss. Hands-on session on face recognition.
  6. Attention and transformers. Image classification and object detection architectures based on Transformers.
  7. 3D Computer Vision. Intro to 3D data: point clouds, meshes, voxel grids, normals. Specialized architectures to classify, and segment point clouds.


  1. Basic knowledge of the Python programming language
  2. Basic knowledge of computer vision and image processing: image formation, image digitization, camera modelling and calibration, basic image manipulation and processing, local image features, basics of stereo vision.

If you attended Computer Vision and Image Processing, you already fulfill the first and second prerequisites. If you didn't, you can ask Prof. Di Stefano access to his slides and lab sessions.

Nice-to-have knowledge, but not prerequisites

  1. Basic knowledge of machine learning: supervised versus unsupervised learning, classification versus regression, underfitting and overfitting, regularization; data split in training, validation and test sets; hyper-parameters and cross-validation.
  2. Basic knowledge of PyTorch: a good intro is available at https://pytorch.org/tutorials/beginner/basics/intro.html

We will briefly revise the basic ML knowledge at the beginning of the course, and the instructor will provide notebooks to introduce PyTorch and discuss them in class. 


The main reference material will be the slides and notes provided on Virtuale by the instructor. A set of pointers to scientific papers and technical reports for each topic will also be provided during lectures.

Several freely-available on-line resources can be useful to complement the material provided by the instructor on Virtuale. 

Teaching methods

Taught lessons.

Theory lessons are complemented by in-class hands-on sessions, where selected topics will be studied from a practical point of view by using the Python language and the PyTorch library.

Assessment methods

An oral exam about the topics covered during the course consisting of two parts.

In the first part, students will present a recent scientific paper related to the course topics previously assigned to them. 

In the second part, students will answer questions on the theory presented in the course.

Teaching tools

Powerpoint slides (whose PDF printouts are available from the course's web site before lectures) are projected and discussed during class hours.

Jupyter notebooks to introduce Pytorch and of all the hands-on sessions will be available on the course website. 

Office hours

See the website of Samuele Salti