93478 - COMPUTER VISION

Academic Year 2023/2024

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Computer Science (cod. 5898)

Learning outcomes

At the end of the course the students will be able to implement algorithms addressing relevant computer vision tasks, such as: object detection, semantic segmentation, image and video captioning. During the course they will learn the basics of image, video analysis and computer vision. They will gain knowledge about the design and implementation of convolutional neural networks, recurrent neural networks and how to combine them. During the course they will also acquire familiarity with the relevant frameworks used to design modern deep architecture.

Course contents

The course introduces concepts, and tools for the design and implementation of image acquisition, processing, and analysis techniques. List of the contents:

Image Formation and Acquisition: geometry of image formation; lenses; field of view and depth of field; image sampling and quantization.

Spatial Filtering: Linear shift-invariant operators. Mean and Gaussian filtering. Median Filtering. Bilateral filtering.

Edge Detection: image gradient; non-maxima suppression; Laplacian of Gaussian; Canny edge detector.

Local Invariant Features: detectors and descriptors; Harris Corners; scale-invariant features; SIFT features;

Camera calibration: Projective coordinates and perspective projection matrix. Intrinsic and extrinsic camera parameters. Zhang's algorithm.

Instance Detection: pattern matching; shape-based matching; Hough transform.

Object Detection: two-stages, one-stage, and anchor-free detectors, RoI pooling operator, feature pyramid networks);

Semantic Segmentation: fully convolutional networks, transposed and dilated convolutions, RoI Align operator, semantic, instance, and panoptic segmentation;

Metric Learning: deep metric learning, contrastive and triplet losses, multi-task learning, application to different recognition/identification tasks;

Attention Mechanism: self-attention in RNN, Transformer architecture, Vision Transformer.

 

Prerequisites:

- Linear Algebra

- Basic knowledge of Machine Learning and Deep Learning

- Programming and Python

Readings/Bibliography

All the slides from the lectures of the course will be made available on the Virtuale platform. There is no official textbook; further details on some of the topics of the course can be found in:

- Szeliski, Richard. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.

- http://d2l.ai/ - Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola "Dive into Deep Learning", 2020

Further readings, such as scientific papers and online resources, might be recommended during the lectures of the course.

Teaching methods

Teaching methods include taught lessons and lab sessions. The datasets and code snippets for the lab sessions will be provided.

The code used in the lab is based on Python, the OpenCV library, the Scikit-learn library and the PyTorch framework.

Assessment methods

The assessment method includes two parts:

- First part: the student will have to prepare a detailed seminar on a recent scientific paper related to the topics of the course. The student should also analyse the source code associated with the scientific paper and replicate at least one experiment. If the source code is not available, the student should implement the relative algorithm and/or model.

- Second part: oral exam with theoretical questions on the topics presented during the course.

Teaching tools

The pdf of the slides used in the course will be made available on the website of the course before each lecture.

The python scripts and datasets required for the lab sessions will be made available on the website of the course.

Office hours

See the website of Giuseppe Lisanti