Anno Accademico 2018/2019
- Docente: Riccardo Rovatti
- Crediti formativi: 9
- SSD: ING-INF/01
- Lingua di insegnamento: Inglese
- Moduli: Riccardo Rovatti (Modulo 1) Luca Benini (Modulo 2) Oreste Andrisano (Modulo 3)
- Modalità didattica: Convenzionale - Lezioni in presenza (Modulo 1) Convenzionale - Lezioni in presenza (Modulo 2) Convenzionale - Lezioni in presenza (Modulo 3)
- Campus: Bologna
- Corso: Laurea Magistrale in Ingegneria elettronica (cod. 0934)
Conoscenze e abilità da conseguire
The course provides students with a basic knowledge of problems and corresponding techniques of solutions implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). The corresponding issues are tackled by multiple points of view, from the abstract characterization of the mathematical properties of BD, to the hardware architectures needed to process them, from the ad-hoc algorithms developed to cope with data deluge to the network issues implied by the storage and communication of data collections that are possibly partitioned in space and time.
Contenuti
MODULE 1
The two directions of along wich Big Data are big
High dimensionality:
- geometric effects of high dimensionality
- computational effects of high dimensionality
- multiplication of large matrices
- dimensionality reduction: JL lemma
- dimensionality reduction: PCA
- dimensionality reduction: compressed sensing classical and adapted approaches
- interpolation in high-dimensional spaces
Streaming:
- sampling data in streams
- filtering data in streams
- counting distinct elements in streams
- estimations from streams: number of ones, distinct elements, most common element...
Prototype problems:
- abstract summary of documents
- Markov chains and pagerank-like algorithms
MODULE 2
Introduction to data centers:
- High-level architecture
- Compute units, network and storage
- Energy efficiency, techniques for improving PUE
- Trends and directions: scale-up vs. scale-out
Introduction to big data workloads
- Amdahl's law, strong and weak scaling
- Map Reduce: Hadoop
- NO-SQL: Cassandra
- In-memory computing: Spark
In-order CPU
- Pipelining basics
- Pipeline hazards
- Memory hierarchy
- Performance analysis techniques
Out-of-order CPU
- ILP and instruction hazards
- Removing false dependencies: renaming
- Removing control hazards: branch prediction
- Precise interrupts and speculation reorder buffer
Multicore CPU
- Message passing vs shared memory vs
- parallel execution models, heterogeneous parallelism
- Cache coherency
- Synchronization
Architectural Performance estimation and analysis
Orario di ricevimento
Consulta il sito web di Riccardo Rovatti
Consulta il sito web di Luca Benini
Consulta il sito web di Oreste Andrisano