Tu sei qui:

84423 - STATISTICS AND ARCHITECTURES FOR BIG DATA PROCESSING M

Anno Accademico 2018/2019

Docente: Riccardo Rovatti
Crediti formativi: 9
SSD: ING-INF/01
Lingua di insegnamento: Inglese

Moduli: Riccardo Rovatti (Modulo 1) Luca Benini (Modulo 2) Oreste Andrisano (Modulo 3)
Modalità didattica: Lezioni in presenza (totalmente o parzialmente) (Modulo 1); Lezioni in presenza (totalmente o parzialmente) (Modulo 2); Lezioni in presenza (totalmente o parzialmente) (Modulo 3)
Campus: Bologna
Corso: Laurea Magistrale in Ingegneria elettronica (cod. 0934)

Conoscenze e abilità da conseguire

The course provides students with a basic knowledge of problems and corresponding techniques of solutions implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). The corresponding issues are tackled by multiple points of view, from the abstract characterization of the mathematical properties of BD, to the hardware architectures needed to process them, from the ad-hoc algorithms developed to cope with data deluge to the network issues implied by the storage and communication of data collections that are possibly partitioned in space and time.

Contenuti

MODULE 1

The two directions of along wich Big Data are big

High dimensionality:

geometric effects of high dimensionality
computational effects of high dimensionality
multiplication of large matrices
dimensionality reduction: JL lemma
dimensionality reduction: PCA
dimensionality reduction: compressed sensing classical and adapted approaches
interpolation in high-dimensional spaces

Streaming:

sampling data in streams
filtering data in streams
counting distinct elements in streams
estimations from streams: number of ones, distinct elements, most common element...

Prototype problems:

abstract summary of documents
Markov chains and pagerank-like algorithms

MODULE 2

Introduction to data centers:

High-level architecture
Compute units, network and storage
Energy efficiency, techniques for improving PUE
Trends and directions: scale-up vs. scale-out

Introduction to big data workloads

Amdahl's law, strong and weak scaling
Map Reduce: Hadoop
NO-SQL: Cassandra
In-memory computing: Spark

In-order CPU

Pipelining basics
Pipeline hazards
Memory hierarchy
Performance analysis techniques

Out-of-order CPU

ILP and instruction hazards
Removing false dependencies: renaming
Removing control hazards: branch prediction
Precise interrupts and speculation reorder buffer

Multicore CPU

Message passing vs shared memory vs
parallel execution models, heterogeneous parallelism
Cache coherency
Synchronization

Architectural Performance estimation and analysis

Orario di ricevimento

Consulta il sito web di Riccardo Rovatti

Consulta il sito web di Luca Benini

Consulta il sito web di Oreste Andrisano