You are here:

87198 - Statistics and Architectures for Big Data Processing M

Academic Year 2022/2023

                
                        Docente:
                        Riccardo Rovatti
                    
                        Credits:
                        6
                    
                        SSD:
                        ING-INF/01
                    
                        Language:
                        English
                    
                        Moduli:
                        
                            Riccardo Rovatti
                            (Modulo 1)
                        
                            Francesco Conti
                            (Modulo 2)
                        
                        Teaching Mode:
                        
                                    In-person learning (entirely or partially) (Modulo 1); 
                                
                                    In-person learning (entirely or partially) (Modulo 2)
                                
                            Campus:
                            Bologna
                        
                            Corso:
                            Second cycle degree programme (LM) in
                            Electronic Engineering (cod. 0934)

                            Teaching resources on Virtuale

Learning outcomes

The course provides students with a basic knowledge of problems and corresponding techniques of solutions implied by the ever increasing amount and complexity of the data available for analyses and decisions, i.e., the so called Big-Data (BD). The corresponding issues are tackled by multiple points of view: from the abstract characterization of the mathematical properties of BD, to the hardware architectures needed to process them.

Course contents

The two dimensions of "Big" in Big Data.

Data dimensionality

geometrical effect of high dimensionality and consequences

Dimensionality reduction

multidimensional Gaussian vectors and their properties
dimensionality reduction by Johnson-Lindenstrauss
dimensionality reduction by SVD/PCA (relationship with Gaussian clustering)
dimensionality reduction by sparse signal recovery/compressed sensing
other uses of SVD/eigenstructures: the hub-authority ranking, the pagerank core idea, document collection summaries)

Interpolation

grid-data multilinear interpolation
grid-data piecewise-linear interpolation
scattered-data interpolation by radial-basis functions

Streaming algorithms

the streaming computation model
streaming random picks and multiplication of huge matrices
streaming estimation of features of occurences histogram
hashing for flattening of distributions
random computation: estimations instead of exact results

Teaching methods

Class teaching

Assessment methods

Oral examination

Office hours

See the website of Riccardo Rovatti

See the website of Francesco Conti