- Docente: Daniele Cesini
- Credits: 4
- SSD: FIS/01
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Bioinformatics (cod. 8020)
Learning outcomes
At the end of the course, the studente has the basic theoretical and practical knowledge on infrastructures for scientific computing, distributed and parallel systems, batch systems and security technologies.
Course contents
The course will provide basic concepts of Infrastructures for processing Big Data and for running scientific applications. In particular it will focus on the Infrastructure-as-a-Service Cloud paradigm. The course will start with an introduction to Big Data and how they are related to scientific applications. It will continue with a description of the building blocks of modern Data Centers and how they are abstracted by the Cloud computing models. A real-life computational challenge will be given and students will create (during the course) a Cloud-based computing model to solve this challenge. Access to a limited set of Cloud resources and services will be granted to students in order to complete the exercises. Containers and in particular Docker Containers will be introduced as for the concept of High Performance Computing (HPC). Notions about the emerging “Fog” and “Edge” computing paradigms and how they are linked to Cloud infrastructures will conclude the course.
Program:
1) Introduction to the course and the computational challenge
Big Data
- Big Data definition
- Big Data applications classification
- Big Data applications examples
- Big Data and scientific applications
- Presentation of the computational challenge that will accompany us during the course.
Hands on:
- Set up of connections and login
2) From your laptop to the datacenter - datacenter building blocks
CPU Farm
i. Batch system, queues, allocation policies, quota etc..
Storage
I. DAS vs NAS
II. SAN
III. TAN
IV. Parallel FS
V. Data lifecycle, QoS
- Migration, recall, ACL
Network: main protocols (Ethernet, infiniband, Fiber Channel)
Monitoring and Provisioning
Hands on: Submission on a small cluster already available to students
3) Infrastructures for Parallel Computing
HTC vs HPC
HTC
- Distributed systems
- Grid Computing
HPC
- Shared memory vs distributed memory
- OPENMPI/OPNMPI
- Accelerators for parallel computing
- Hybrid and non-standard resources
Energy efficiency and Low-power computing
- Towards exascale computing
Hands: Demo Live - Speedup curve creations for the NAMD SMTV/APOA1 use cases. Computing on a GPU. Computing on Low Power systems.
5) Cloud Infrastructure
Cloud Computing: Introduction
Clod Computing Dimensions - IaaS, PaaS, SaaS, service and isolation models
Cloud IaaS
i. Advantages and Disadvantages
ii. Application Porting to the Cloud
iii. AWS Usage
Cloud Storage - provisioning of block device and POSIX filesystems
Hands on:
- IaaS instantiation with AWS - create the infrastructure to run the course exercises
- Instantiation of multiple machines - experience on cloud elasticity - Create a mini-cluster - Run the course exercise on that cluster
- Create storage volumes on the Cloud and make them available to the cluster
- Hadoop cluster creation
- MapReduce introduction and exercise
6) Introduction to Containers
- Basic concepts about containers
- Running and extending containers
- Docker Hub and dockerfiles
- Connecting containers to file systems
- Exporting and importing containers
- Docker-compose
- Running docker containers in userspace with udocker
7) Computing Continuum
- Low Power devices
- Introduction to Edge Computing
- Introduction to Fog Computing
- The Computing Continuum for Big Data Infrastructures
- Energy efficiency and Low-power computing
- Towards exascale computing
The Course will include for the interested students a visit to the INFN-CNAF datacenter in Bologna.
Readings/Bibliography
Course material will be shared, plus external MOOCs and books will be suggested during the course.
Teaching methods
The teaching method will be based on some theoretical foundations but it will be highly complemented with practical considerations on real infrastructures used for big data processing, as well as with some hands-on sessions.
Due to the kind of activity and didactical methods, attending the present course requires the prior participation of all students to the following e-learning Modules 1 and 2:
Module 1 – Safety General Training
Module 2 – Safety Specific Training (part I)
Assessment methods
There will be an oral exam, focusing on the topics presented during the course.
Students will be requested to prepare a small project that will be discussed during the exam.
Teaching tools
Slides for the theory, use of real-world infrastructures for the hands-on sessions
Office hours
See the website of Daniele Cesini