87198 - Statistics and Architectures for Big Data Processing M

Academic Year 2019/2020

Learning outcomes

The course provides students with a basic knowledge of problems and corresponding techniques of solutions implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). The corresponding issues are tackled by multiple points of view: from the abstract characterization of the mathematical properties of BD, to the hardware architectures needed to process them.

Course contents

The two dimensions of "Big" in Big Data.

Data dimensionality

  • geometrical effect of high dimensionality and consequences

Dimensionality reduction

  • multidimensional Gaussian vectors and their properties
  • dimensionality reduction by Johnson-Lindenstrauss
  • dimensionality reduction by SVD/PCA (relationship with Gaussian clustering) 
  • dimensionality reduction by sparse signal recovery/compressed sensing
  • other uses of SVD/eigenstructures: the hub-authority ranking, the pagerank core idea, document collection summaries)

Interpolation

  • grid-data multilinear interpolation
  • grid-data piecewise-linear interpolation
  • scattered-data interpolation by radial-basis functions

Streaming algorithms

  • the streaming computation model
  • streaming random picks and multiplication of huge matrices
  • streaming estimation of features of occurences histogram
  • hashing for flattening of distributions
  • random computation: estimations instead of exact results

 

Teaching methods

Class teaching

Assessment methods

Oral examination

Office hours

See the website of Riccardo Rovatti

See the website of Luca Benini