78637 - STATISTICS, ALGORITHMS AND SYSTEMS FOR BIG DATA PROCESSING M

Anno Accademico 2017/2018

  • Docente: Riccardo Rovatti
  • Crediti formativi: 12
  • SSD: ING-INF/01
  • Lingua di insegnamento: Inglese
  • Moduli: Riccardo Rovatti (Modulo 1) Luca Benini (Modulo 2) Claudio Sartori (Modulo 3) Oreste Andrisano (Modulo 4)
  • Modalità didattica: Convenzionale - Lezioni in presenza (Modulo 1) Convenzionale - Lezioni in presenza (Modulo 2) Convenzionale - Lezioni in presenza (Modulo 3) Convenzionale - Lezioni in presenza (Modulo 4)
  • Campus: Bologna
  • Corso: Laurea Magistrale in Ingegneria elettronica (cod. 0934)

Conoscenze e abilità da conseguire

The course provides students with a basic knowledge of problems and corresponding techniques of solutions implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). The corresponding issues are tackled by multiple points of view, from the abstract characterization of the mathematical properties of BD, to the hardware architectures needed to process them, from the ad-hoc algorithms developed to cope with data deluge to the network issues implied by the storage and communication of data collections that are possibly partitioned in space and time.

Contenuti

module 1
* Intro: the two dimensions of Big Data: high dimensionality and streaming
* High dimensionality:
- geometric effects of high dimensionality
- computational effects of high dimensionality
- dimensionality reduction, JL lemma
- compressed sensing classical and adapted approaches
* Streaming:
- sampling data in streams
- filtering data in streams
- counting distinct elements in streams
- estimations from streams: number of ones, distinct elements, most common element...

module 2
- parallel computer architecture, main classes and scalability 6 hours
- big data workloads: characteristics and requirements 4 hours
- processing engines for big data - CPU, GPU, ACCELERATORS... 8 hours
- memory hierarchy and IO systems for big data 8 hours
- case studies 4 hours

module 3
Basic networking, network layers, SDN and VNF
Batch processing vs Stream processing of Big Data: constraints and network design
Consumer requirements and network design: the chain of value - CP
Crowd sourcing, environmental monitoring
Multidimensional Sampling Theory.
Reconstruction techniques from random samples.
Realistic scenarios for Big Data acquisition: measurement errors effects and theoretical limits due to a tradeoff between precision and communication constraints

module 4

is equivalent to the first module of "75194 - DATA MINING M"

Modalità di verifica e valutazione dell'apprendimento

Oral examination.

Orario di ricevimento

Consulta il sito web di Riccardo Rovatti

Consulta il sito web di Luca Benini

Consulta il sito web di Claudio Sartori

Consulta il sito web di Oreste Andrisano