- Docente: Samuele Salti
- Credits: 6
- SSD: ING-INF/05
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Artificial Intelligence (cod. 9063)
-
from Sep 15, 2025 to Dec 17, 2025
Learning outcomes
At the end of the course, the student masters the most popular modern machine-learning approaches to computer-vision tasks, with particular reference to specialized deep-learning architectures. The student has both a theoretical understanding and the necessary practical skills required to develop state-of-the-art image and video analysis systems for real-world applications.
Course contents
The course is held in the first semester, from September to December.
Topics:
- Advanced CNNs: : ResNeXt and grouped convolutions, MobileNets, EfficientNet and RegNet.
- Attention and vision Transformers. Transformers and atttention. Image classification architectures based on Transformers.
- Object detection. Introduction to ensemble learning via boosting. The Viola-Jones detector and its applications. Specialized NN architectures for object detection. Two-stages, one-stage, and anchor-free detectors. Feature Pyramid Networks. Imbalanced learning and the focal loss. DEtection TRansformer (DETR) and the Hungarian loss. Hands-on session on object detection.
- Semantic/instance/panoptic segmentation. Ensemble learning via bagging and random forests. The algorithm behind the Kinect body part segmentation. Transposed and dilated convolutions. Fully Convolutional Networks, U-net, DeepLab. Instance segmentation and Mask R-CNN. Panoptic segmentation, MaskFormer.
- Depth estimation from monocular images: photometric loss and Monodepth.
- Metric and representation learning. Deep metric learning and its applications to face recognition/identification and beyond. Contrastive and triplet loss, ArcFace, NT-Xent loss, CLIP. Hands-on session on metric learning.
- Image generation with GANs and diffusion models: metrics for generative tasks. Generative Adversarial Networks and Denoising Diffusion Probabilistic Models. Stable diffusion and text-guided image generation. Hands-on session on textual inversion.
Prerequisites:
- Computer vision and image processing: image formation, image digitization, camera modelling and calibration, basic image manipulation and processing, local image features.
- PyTorch: a good intro is available at https://pytorch.org/tutorials/beginner/basics/intro.html
- Machine learning: supervised versus unsupervised learning, classification versus regression, underfitting and overfitting, regularization; data split in training, validation and test sets; hyper-parameters and cross-validation.
If you attended Computer Vision and Image Processing (either thought by Prof. Lisanti and me or by Prof. Di Stefano), you already fulfill these prerequisites. If you didn't, you can find on Virtuale slides and lab sessions.
Readings/Bibliography
The main reference material will be the slides and notes provided on Virtuale by the instructor. A set of pointers to scientific papers and technical reports for each topic will also be provided during lectures.
Several freely-available on-line resources can be useful to complement the material provided by the instructor on Virtuale.
- https://udlbook.github.io/udlbook/ Simon J.D. Prince, "Understanding Deep Learning", MIT Press, 2023.
- http://d2l.ai/ - Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola "Dive into Deep Learning", 2020
- http://www.deeplearningbook.org/ - Ian Goodfellow and Yoshua Bengio and Aaron Courville, "Deep Learning", 2016.
- https://github.com/fastai/fastbook - Jeremy Howard and Sylvain Gugger, "Deep Learning for Coders with fastai and PyTorch", 2020
- https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf - Eli Stevens, Luca Antiga, and Thomas Viehmann, "Deep learning with PyTorch", July 2020.
Teaching methods
Lectures, complemented by in-class hands-on sessions, where selected topics will be studied from a practical point of view by using the Python language and the PyTorch library.
Assessment methods
The assessment methods comprise of a theoretical part and a practical part.
The theoretical part is an oral exam. Students will present a recent scientific paper related to the course topics, previously agreed with the instructor. Then, students will answer questions on the paper and on the theory discussed in the course.
The practical part is an assignment on topics covered in the hands-on sessions and the theoretical lessons. The assignment must be submitted before sitting for the theoretical part.
The assignment is worth 10 points, the oral exam 22.
The oral exam aims to assess the student’s knowledge of the course content and their ability to understand and present in a limited amount of time a scientific paper related to the course's content.
Grade scale of oral exam:
- Basic knowledge of the topics covered in the exam; some errors in the understanding of the presented paper or superficial understanding of it; generally correct language → 12–13
- Solid knowledge of the topics covered in the exam; correct understanding of the presented paper; correct language use → 14-17
- Thorough knowledge of the topics covered in the exam and ability to reason critically on them; very good understanding of the presented paper; mastery of subject-specific terminology → 18-20
- Excellent knowledge of the topics covered in the exam and ability to reason critically about them; complete understanding of the paper and of the most relevant referenced works; full command of subject-specific terminology → 21-22
Grade scale for the assignment:
- Basic implementation of the task; code is functional but contains errors, lacks structure, or is difficult to follow; minimal experimentation and little to no discussion of results; metrics may be reported but not properly interpreted; limited or unclear exposition → 6
- Correct implementation of the task; results are reproducible; code is organized and mostly readable; metrics are computed and reasonably interpreted; some empirical analysis or comparison of alternatives is present; exposition is clear, though not always thorough → 7–8
- Well-structured and readable code; clear and reproducible workflow; solid experimentation with ablation studies or meaningful comparisons; good understanding of metrics; explanations show thoughtful reflection and interpretation of results → 9
- Excellent and well-structured submission; clear motivation behind design choices; critical and well-documented experimental procedure, including insightful ablations or comparisons; results are convincingly interpreted and contextualized; all material (code, text, plots) is cohesive and professionally presented → 10
Teaching tools
Powerpoint slides (whose PDF printouts are available from the course's web site before lectures) are projected and discussed during class hours.
Jupyter notebooks of all the hands-on sessions will be available on the course website.
Office hours
See the website of Samuele Salti