You are here:

88406 - Infrastructures for Big Data Processing

Academic Year 2019/2020

                
                        Docente:
                        Davide Salomoni
                    
                        Credits:
                        4
                    
                        SSD:
                        FIS/01
                    
                        Language:
                        English
                    
                        Teaching Mode:
                        Traditional lectures
                        
                            Campus:
                            Bologna
                        
                            Corso:
                            Second cycle degree programme (LM) in
                            Bioinformatics (cod. 8020)

                            Teaching resources on Virtuale

Learning outcomes

At the end of the course, the student has practical and theoretical knowledge on distributed computing and storage infrastructures, cloud computing and virtualization, parallel computing and their application to Big Data Analysis

Course contents

The course on "Infrastructures for Big Data Processing" (BDP2) builds on the course "Introduction to Big Data Processing Infrastructures" (BDP1). Before following this course, you should have already followed the BDP1 course, or at least have good familiarity with the topics covered there.

The BDP2 course will first recap the foundations of Cloud computing and storage services beyond IaaS (PaaS and SaaS). It will then proceed to discuss how to exploit distributed infrastructures for deploying applications and perform processing of big data.

A distinct feature of BDP2 is that it has a substantial amount of hands-on sessions that directly connect to the theoretical part. This way, students will readily apply the concepts that are being exposed to real-world use cases. To achieve maximum benefit out of this method, it is strongly recommended that students attend all lectures.

A pre-requisite to follow this course is that each student brings his/her own laptop to the lectures. The laptop should run Microsoft Windows, Linux or Mac OS X. Tablets are not supported. University of Bologna individual credentials are required in order to access the course material and the computing facilities that will be used during the course.

Introduction to BDP2

Course introduction and objectives
Clouds beyond the IaaS: general points
How to use the Cloud infrastructure for this course.

Cloud Storage

File systems and POSIX storage
The Network File System (NFS)
Object storage, the REST architecture and the JSON format
Virtual file systems
Simple examples of local and remote data processing

Advanced Docker Containers

Recap of basic concepts about containers (from BDP1)
Networking in containers
Process management, logging and security
A complete application development workflow

Authentication and Authorization

Principles of Cloud authentication and authorization
X.500, LDAP, Radius, Kerberos
X.509 and public-key cryptography
SAML, eduGAIN, IDEM, SPID
OAuth and OpenID-Connect
INDIGO IAM
Adapting an application to use INDIGO-IAM

Cloud Automation

What is Cloud Automation
Microservices and monoliths
The DevOps concept
Container orchestration: Docker Swarm and Kubernetes
Infrastructure as Code: serverless technologies
Template-based orchestration of applications
Function as a Service (FaaS): hands-on, with the development and deployment of a simple FaaS application

Readings/Bibliography

Course material will be shared, plus external MOOCs and books will be suggested during the course.

Teaching methods

The teaching method will be based on some theoretical foundations but it will be highly complemented with practical considerations on real infrastructures used for big data processing, as well as with several hands-on sessions.

It is strongly recommended to attend all lectures.

Assessment methods

The exam will be oral only, focusing on the topics presented during the course.

Teaching tools

Slides for the theory, use of real-world infrastructures for the hands-on sessions.

Note that a personal laptop (running Windows, Linux or MacOS - no tablets) is required during the lectures to follow both the lectures and the hands-on sessions.

Office hours

See the website of Davide Salomoni