91266 - Machine Learning for Computer Vision

Academic Year 2023/2024

  • Docente: Samuele Salti
  • Credits: 6
  • SSD: ING-INF/05
  • Language: English
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Artificial Intelligence (cod. 9063)

Learning outcomes

At the end of the course, the student masters the most popular modern machine-learning approaches to computer-vision tasks, with particular reference to specialized deep-learning architectures. The student has both a theoretical understanding and the necessary practical skills required to develop state-of-the-art image and video analysis systems for real-world applications.

Course contents

The course is held in the first semester, from September to December.


  1. Practical aspects of model training. Regularization, optimizers, training recipes.
  2. Attention and vision transformers. Image classification architectures based on Transformers (ViT, SWiN). ConvNeXt.
  3. Object detection. Introduction to ensemble learning via boosting. The Viola-Jones detector and its applications. Specialized NN architectures for object detection. Two-stages, one-stage, and anchor-free detectors. RoI Pooling operator, Feature Pyramid Networks. Imbalanced learning and the focal loss. Hands-on session on object detection.
  4. Dense prediction problems: semantic/instance segmentation and depth from mono/stereo. Ensemble learning via bagging and random forests. The algorithm behind the Kinect body part segmentation. Fully Convolutional Networks. Transposed and dilated convolutions. RoI Align operator. Specialized NN architectures for semantic, instance, and panoptic segmentation. Deep networks for depth estimation: DispNet, GCNet, RAFTStereo, Monodepth.
  5. Metric and representation learning. Deep metric learning and its applications to face recognition/identification and beyond. Locally connected layers. Contrastive and triplet loss. Unsupervised representation learning. Hands-on session on face recognition.
  6. 3D computer vision: data structures (point clouds, mesh, voxel grids). Specialized neural networks for point clouds and voxels. Hands-on session on point cloud classification.
  7. Image generation with diffusion models: denoising diffusion probabilistic models and score-matching models. Stable diffusion and text-guided image generation. Hands-on session on textual inversion.


  1. Basic knowledge of computer vision and image processing: image formation, image digitization, camera modelling and calibration, basic image manipulation and processing, local image features, basics of stereo vision.
  2. Basic knowledge of PyTorch: a good intro is available at https://pytorch.org/tutorials/beginner/basics/intro.html
  3. Basic knowledge of machine learning: supervised versus unsupervised learning, classification versus regression, underfitting and overfitting, regularization; data split in training, validation and test sets; hyper-parameters and cross-validation.

If you attended Computer Vision and Image Processing (either thought by Prof. Lisanti and me or by Prof. Di Stefano), you already fulfill these prerequisites. If you didn't, you can find on Virtuale slides and lab sessions. 


The main reference material will be the slides and notes provided on Virtuale by the instructor. A set of pointers to scientific papers and technical reports for each topic will also be provided during lectures.

Several freely-available on-line resources can be useful to complement the material provided by the instructor on Virtuale.

Teaching methods

Taught lessons.

Theory lessons are complemented by in-class hands-on sessions, where selected topics will be studied from a practical point of view by using the Python language and the PyTorch library.

Assessment methods

An oral exam about the topics covered during the course consisting of two parts.

In the first part, students will present a recent scientific paper related to the course topics, previously agreed with the instructor. 

In the second part, students will answer questions on the theory presented in the course.

Teaching tools

Powerpoint slides (whose PDF printouts are available from the course's web site before lectures) are projected and discussed during class hours.

Jupyter notebooks of all the hands-on sessions will be available on the course website. 

Office hours

See the website of Samuele Salti


Quality education Industry, innovation and infrastructure

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.