93478 - COMPUTER VISION

Academic Year 2021/2022

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Computer Science (cod. 8028)

Learning outcomes

At the end of the course the students will be able to implement algorithms addressing relevant computer vision tasks, such as: object detection, semantic segmentation, image and video captioning. During the course they will learn the basics of image, video analysis and computer vision. They will gain knowledge about the design and implementation of convolutional neural networks, recurrent neural networks and how to combine them. During the course they will also acquire familiarity with the relevant frameworks used to design modern deep architecture.

Course contents

The course introduces concepts, and tools for the design and implementation of image/video acquisition, processing, and analysis techniques.

The first part of the course will focus on the fundamental concepts related to the image formation and acquisition process, such as: geometry of image formation, pinhole camera model, perspective projection, projective coordinates and perspective projection matrix, camera calibration, image rectification.

Then it will focus on the extraction of hand-crafted features and their matching: edges and corners, detectors and descriptors, scale invariant features, SIFT features, efficient feature matching, image stitching, bag of visual words.

The second part of the course will focus on modern deep learning architectures proposed for:

- object detection (two-stages, one-stage, and anchor-free detectors, RoI pooling operator, feature pyramid networks)

- semantic segmentation (fully convolutional networks, transposed and dilated convolutions, RoI Align operator, architectures for semantic, instance, and panoptic segmentation)

- self-attention (transformer architecture, vision transformer, data-efficient image transformer)

- metric learning (deep metric learning, contrastive and triplet losses, multi-task learning, application to different recognition/identification tasks)

Readings/Bibliography

All the slides from the lectures of the course will be made available on the Virtuale platform. There is no official textbook. Several resources are freely available online:

http://d2l.ai/ - Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola "Dive into Deep Learning", 2020

Szeliski, Richard. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.

Further readings, such as scientific papers and online resources, might be recommended during the lectures of the course.

Teaching methods

Teaching methods include taught lessons and lab sessions. The datasets and code snippets for the lab sessions will be provided. The code used in the lab will be implemented using (mostly) Python and the PyTorch, Scikit-learn and OpenCV frameworks

Assessment methods

The assessment method include two parts:

- First part: the student will have to prepare a detailed seminar on a recent scientific paper related to the topics of the course. The presentation should also include a detailed description/analysis of the source code associated with the scientific paper and the results achieved. If the latter is not available, the student should implement the relative model.

- Second part: oral exam with theoretical questions on the topics presented during the course.

Teaching tools

The pdf of the slides used in the course will be made available on the website of the course before each lecture.

The python scripts and datasets required for the lab sessions will be made available on the website of the course.

Office hours

See the website of Giuseppe Lisanti