87198 - STATISTICS AND ARCHITECTURES FOR BIG DATA PROCESSING M

Scheda insegnamento

Anno Accademico 2020/2021

Conoscenze e abilità da conseguire

The course provides students with a basic knowledge of problems and corresponding techniques of solutions implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). The corresponding issues are tackled by multiple points of view: from the abstract characterization of the mathematical properties of BD, to the hardware architectures needed to process them.

Contenuti

The two dimensions of "Big" in Big Data.

Data dimensionality

  • geometrical effect of high dimensionality and consequences

Dimensionality reduction

  • multidimensional Gaussian vectors and their properties
  • dimensionality reduction by Johnson-Lindenstrauss
  • dimensionality reduction by SVD/PCA (relationship with Gaussian clustering)
  • dimensionality reduction by sparse signal recovery/compressed sensing
  • other uses of SVD/eigenstructures: the hub-authority ranking, the pagerank core idea, document collection summaries)

Interpolation

  • grid-data multilinear interpolation
  • grid-data piecewise-linear interpolation
  • scattered-data interpolation by radial-basis functions

Streaming algorithms

  • the streaming computation model
  • streaming random picks and multiplication of huge matrices
  • streaming estimation of features of occurences histogram
  • hashing for flattening of distributions
  • random computation: estimations instead of exact results

Metodi didattici

Class teaching

Modalità di verifica dell'apprendimento

Oral examination

Orario di ricevimento

Consulta il sito web di Riccardo Rovatti

Consulta il sito web di Luca Benini