87809 - INTRODUCTION TO BIG DATA PROCESSING INFRASTRUCTURES

Anno Accademico 2018/2019

  • Docente: Daniele Cesini
  • Crediti formativi: 4
  • SSD: FIS/01
  • Lingua di insegnamento: Inglese
  • Modalità didattica: Convenzionale - Lezioni in presenza
  • Campus: Bologna
  • Corso: Laurea Magistrale in Bioinformatics (cod. 8020)

Conoscenze e abilità da conseguire

At the end of the course, the studente has the basic theoretical and practical knowledge on infrastructures for scientific computing, distributed and parallel systems, batch systems and security technologies.

Contenuti

The course will provide basic concepts of Infrastructure for BigData processing, including Cloud computing at the Infrastructure-as-a-Service level. The course will start with a description of the building blocks of modern data centers and how they are abstracted by the Cloud paradigm. A real-life computational challenge will be given and students will create (during the course) a cloud-based computing model to solve this challenge. A very brief introduction to High Performance Computing (HPC) will also be given. Notions about the emerging “fog” and “edge” computing paradigms and how they are linked to Cloud infrastructures will conclude the course.

Program:

1) Introduction to the course and the computational challenge

- Introduction to BigData

- Presentation of the computational challenge that will accompaign us during the course.

Hands on:

- Set up oftestbed for exercises

2) From your laptop to the datacenter - datacenter building blocks

- CPU Farm

i. Batch system, queues, allocation policies, quota etc..

- Storage

I. DAS vs NAS

II. SAN

III. TAN

IV. Parallel FS

V. Data lifecycle, QoS

- Migration, recall, ACL

- Network: main protocols (eth, infiniband, fc)

- Monitoring and Provisioning

Hands on: Submission on a small cluster already avalaible to students

3) Infrastructures for Parallel Computing

HTC vs HPC

HTC

- Distributed systems

- Grid Computing

HPC

- Shared memory vs distributed memory

- OPENMPI/OPNMPI

- Accelerators for parallel computing

- Hybrid and non-standard resources

Energy efficiency and Low-power computing

- Towards exascale computing

Hands: Demo Live - Speedup curve creations for the NAMD SMTV/APOA1 use cases. Computing on a GPU. Computing on Low Power systems.

4) Cloud IaaS

Cloud Computing: Introduction

Cloud IaaS

i. Advantages and Disadvantages

ii. Application Porting to the Cloud

iii. Openstack introduction

iv. Amazon vs Openstack

Cloud Storage - provisioning di block device e posix fs

Hands on: IaaS instantiation with Openstack - create the infrastructure to run the course exercises

Instatiation of multiple machines - experience on cloud elasticity - Create a mini-cluster - Run the course exercise on that cluster

Create storage volumes on the Cloud and make them available to the cluster

5) Creating a computing model in distributed infrastructures and multi-sites Cloud

Job Submission strategies

i. Push vs pull

ii. Compute driven model

iii. Workload Management Systems

Data Management startegies

i. Repliche, QoS

ii. Data driven computing models

Failover and Disaster Recovery strategies

6) Computing Continuum

- Low Power devices

- Introduction to Edge Computing

- Introducion to Fog Computing

- The Computing Continuum for Big Data Infrastructures

 

The Course will include for the interested students a visti to the INFN-CNAF datacenter in Bologna.

Testi/Bibliografia

Course material will be shared, plus external MOOCs and books will be suggested during the course.

Metodi didattici

The teaching method will be based on some theoretical foundations but it will be highly complemented with practical considerations on real infrastructures used for big data processing, as well as with some hands-on sessions.

Modalità di verifica e valutazione dell'apprendimento

There will be an oral exam, focusing on the topics presented during the course.

Students will be requested to prepare a small project that will be discussed during the exam.

Strumenti a supporto della didattica

Slides for the theory, use of real-world infrastructures for the hands-on sessions

Orario di ricevimento

Consulta il sito web di Daniele Cesini