78637 - Statistics, Algorithms and Systems for Big Data Processing M

Academic Year 2017/2018

  • Moduli: Riccardo Rovatti (Modulo 1) Luca Benini (Modulo 2) Claudio Sartori (Modulo 3) Oreste Andrisano (Modulo 4)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2) Traditional lectures (Modulo 3) Traditional lectures (Modulo 4)
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Electronic Engineering (cod. 0934)

Learning outcomes

The course provides students with a basic knowledge of problems and corresponding techniques of solutions implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). The corresponding issues are tackled by multiple points of view, from the abstract characterization of the mathematical properties of BD, to the hardware architectures needed to process them, from the ad-hoc algorithms developed to cope with data deluge to the network issues implied by the storage and communication of data collections that are possibly partitioned in space and time.

Course contents

module 1
* Intro: the two dimensions of Big Data: high dimensionality and streaming
* High dimensionality:
- geometric effects of high dimensionality
- computational effects of high dimensionality
- dimensionality reduction, JL lemma
- compressed sensing classical and adapted approaches
* Streaming:
- sampling data in streams
- filtering data in streams
- counting distinct elements in streams
- estimations from streams: number of ones, distinct elements, most common element...

module 2
- parallel computer architecture, main classes and scalability 6 hours
- big data workloads: characteristics and requirements 4 hours
- processing engines for big data - CPU, GPU, ACCELERATORS... 8 hours
- memory hierarchy and IO systems for big data 8 hours
- case studies 4 hours

module 3
Basic networking, network layers, SDN and VNF
Batch processing vs Stream processing of Big Data: constraints and network design
Consumer requirements and network design: the chain of value - CP
Crowd sourcing, environmental monitoring
Multidimensional Sampling Theory.
Reconstruction techniques from random samples.
Realistic scenarios for Big Data acquisition: measurement errors effects and theoretical limits due to a tradeoff between precision and communication constraints

module 4

is equivalent to the first module of "75194 - DATA MINING M"

Assessment methods

Oral examination.

Office hours

See the website of Riccardo Rovatti

See the website of Luca Benini

See the website of Claudio Sartori

See the website of Oreste Andrisano