91266 - Machine Learning for Computer Vision

Course Unit Page

SDGs

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education

Academic Year 2020/2021

Learning outcomes

At the end of the course, the student masters the most popular modern machine-learning approaches to computer-vision tasks, with particular reference to specialized deep-learning architectures. The student has both a theoretical understanding and the necessary practical skills required to develop state-of-the-art image and video analysis systems for real-world applications.

Course contents

The course is held in the first semester, from September to December.

Topics:

  1. Image classification/recognition: limits of hand-crafted methods; brief review of machine learning and data-driven methods; bag-of-words models applied to dense descriptors; introduction to Neural Networks (NNs) and Convolutional NNs; AlexNet and the deep learning revolution. Advanced and state-of-the-art architectures for image classification.
  2. Intro to PyTorch. Hand-on sessions where we implement and train ResNets "from scratch" and in transfer learning from ImageNet on some simple datasets; discussion on the impact of several design choices.
  3. Object detection. Introduction to ensemble learning via boosting. The Viola-Jones detector and its applications. Specialized NN architectures for object detection. Hands-on session on implementing and training a state-of-the-art network on a simple dataset.
  4. Dense prediction problems: semantic/instance segmentation, depth from mono/stereo, optical flow. Ensemble learning via bagging and random forests. The algorithm behind the Kinect body part segmentation. Specialized architectures for segmentation and depth estimation. Hands-on session on implementing and training a state-of-the-art network for dense prediction on a simple dataset.
  5. Representation learning via self-supervision. Deep metric learning and its applications to face recognition.
  6. 3D Computer Vision. Intro to 3D data: point clouds, meshes, voxel grids, normals. Hand-crafted and learned local invariant features. Specialized architectures to classify, segment, and register point clouds. Hands-on session on implementing and training a state-of-the-art network for 3D point cloud processing.

Prerequisites:

  1. Basic knowledge of the Python programming language
  2. Basic knowledge of computer vision and image processing: image formation, image digitization, camera modelling and calibration, basic image manipulation and processing, local image features, basics of stereo vision.
  3. Basic knowledge of machine learning: supervised versus unsupervised learning, classification versus regression, underfitting and overfitting, regularization; data split in training, validation and test sets; hyper-parameters and cross-validation.

If you attended Computer Vision and Image Processing, you already fulfill the first and second prerequisites. If you didn't, you can ask Prof. Di Stefano access to his slides and lab sessions.
If you attended the Machine Learning and Deep Learning course of the LM in AI, you already fulfill the third prerequisite. If you didn't, you can ask Prof. Sartori access to his slides.

Readings/Bibliography

The main reference material will be the slides and notes provided on IOL by the instructor. A set of pointers to scientific papers and technical reports for each topic will also be provided during lectures.

Several freely-available on-line resources can be useful to complement the material provided by the instructor on IOL. 

Teaching methods

Taught lessons.

Theory lessons are complemented by in-class hands-on sessions, where selected topics will be studied from a practical point of view by using the Python language and the PyTorch library.

Assessment methods

An oral exam about the topics covered during the course consisting of two parts.

In the first part, students will present a recent scientific paper related to the course topics previously assigned to them. 

In the second part, students will answer questions on the theory presented in the course.

Teaching tools

Powerpoint slides (whose PDF printouts are available from the course's web site before lectures) are projected and discussed during class hours.

Jupyter notebooks for all the hands-on sessions will be available on the course website. 

Office hours

See the website of Samuele Salti