87198 - STATISTICS AND ARCHITECTURES FOR BIG DATA PROCESSING M

Anno Accademico 2023/2024

  • Docente: Mauro Mangia
  • Crediti formativi: 6
  • SSD: ING-INF/01
  • Lingua di insegnamento: Inglese

Conoscenze e abilità da conseguire

The course provides students with a basic knowledge of problems and corresponding techniques of solutions implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). The corresponding issues are tackled by multiple points of view: from the abstract characterization of the mathematical properties of BD, to the hardware architectures needed to process them.

Contenuti

Algorithms (module 1 – Prof. Mauro Mangia):

  • Introduction to Python: packages for data processing and visualization
  • Data analytics: linear algebra for machine learning, basic of statistics and signal transformation
  • Basics of machine learning: autoregressive models, clustering, classification problem, base of neural networks, autoencoders
  • Dimensionality reduction, basis of linear algebra with high dimension
  • Streaming algorithms: basic streaming algorithms for feature extraction, streaming approaches for the PCA/PSA problem.

Architectures (module 2 – Prof. Francesco Conti):

  • Recap of basic computer architecture: from high-level languages to Instruction Set Architecture; memory hierarchy; in-order processors. Evaluating computers: latency and throughput, memory bandwidth, energy efficiency metrics.
  • High-performance cache hierarchy: direct mapping & associative caches. Miss rate and penalty. Write-through and write-back caches.
  • High-performance processors: branch prediction, out-of-order execution, speculation (main concepts).
  • Vector and SIMD processing.
  • Multi-cores: shared-memory and distributed parallel computing.
  • Large-scale computing systems: warehouse and streaming computing; brief notes on storage and networking.

Testi/Bibliografia

Main suggested reading for module 1:

Outlier Analysis
Charu C. Aggarwal
Springer (2nd edition, 2017)

Deep Learning
Ian Goodfellow, Yoshua Bengio, Aaron Courville
MIT Press (2016)

Main suggested readings for module 2:

Computer Architecture: a Quantitative Approach
John L. Hennessy, David A. Patterson
Morgan Kaufmann (2017)

The Datacenter as a Computer: Designing Warehouse-Scale Machines
Luiz A. Barroso, U. Holzle, P. Ranganathan
Morgan & Claypool Publishers (3rd edition)

Basic programming skills (Python / C) are mandatory for the course. Students that have not followed a basic computer architecture course are strongly encouraged to read preliminarily:

Computer Organization and Design RISC-V Edition: The Hardware Software Interface
David A. Patterson, John L. Hennessy
Morgan Kaufmann (2017)

Metodi didattici

Frontal lectures + laboratory exercises with own laptop.

Modalità di verifica e valutazione dell'apprendimento

Learning is assessed by means of a joint oral exam for the two modules on the topics discussed during the frontal lectures.

Strumenti a supporto della didattica

Annotated slides and additional teaching materials available online.

All materials will be shared by means of the official Virtuale site of the course.

Orario di ricevimento

Consulta il sito web di Mauro Mangia

Consulta il sito web di Francesco Conti