95630 - BIG DATA AND CLOUD PLATFORMS

Anno Accademico 2022/2023

  • Docente: Matteo Francia
  • Crediti formativi: 6
  • SSD: ING-INF/05
  • Lingua di insegnamento: Inglese
  • Moduli: Enrico Gallinucci (Modulo 1) Matteo Francia (Modulo 2)
  • Modalità didattica: Convenzionale - Lezioni in presenza (Modulo 1) Convenzionale - Lezioni in presenza (Modulo 2)
  • Campus: Cesena
  • Corso: Laurea Magistrale in Digital Transformation Management (cod. 5815)

Conoscenze e abilità da conseguire

At the end of the course, the student: - Knows the applications of Big Data technologies and the respective challenges - Knows the hardware and software architectures proposed to handle Big Data - Knows the techniques to store the data and the fundamentals aspect of new generation database systems - Knows the programming paradigms generally adopted in this kind of systems and the main analysis methodologies (batch, interactive, streaming) - Learns the design patterns that regulate the deployment in the Cloud of complex ICT solutions - Learns some of the most relevant components of the Cloud Platforms, with a specific focus on those services that enable Big Data management and IoT applications - Is able to make decisions concerning the appropriate Cloud Platform and the related services to be adopted - Knows the billing models that lay behind Cloud Computing services and learns how to estimate the cost of a specific solution, to support project management, to prepare quotations, or to support the management control system - Acquires practical expertise through laboratory activities in using some of the main open-source Big Data software tools, as well as some of the most adopted Cloud Computing services available on the market

Contenuti

Big Data Architectures and Paradigms

  • Hardware infrastructures and software architectures
  • Data storage in distributed file systems and NoSQL databases
  • The MapReduce programming paradigm
  • Main principles of application design and optimization based on Apache Spark
  • Architectures and algorithms to handle streams of data

Handling Big Data in the Cloud

  • Introduction to data platforms: shifting from databases to well-integrated data ecosystems
  • Definition of cloud and taxonomy of cloud services
  • Introduction to the most relevant Cloud Platforms, with a specific focus on those services that enable data platforms and IoT applications
  • Introduction to the billing models that lay behind Cloud Computing services. Cluster migration Cluster on-premises vs in the cloud
  • Deploy real case studies on a cloud provider

Seminars by companies working with cloud and big data platforms

Testi/Bibliografia

  • Slides

Recommended readings:

  • Ian Foster, Dennis Gannon. Cloud Computing for Science and Engineering. MIT Press, 2017
  • Zburivsky, Danil, and Lynda Partner. Designing Cloud Data Platforms. Simon and Schuster, 2021.
  • Tom White. Hadoop - The Definitive Guide (4th edition). O'Reilly, 2015
  • Matei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell. Learning Spark, 2nd Edition. O'Reilly, 2020
  • Andrew G. Psaltis. Streaming Data - Understanding the real-time pipeline. Manning, 2017

Further readings will be mentioned during the course.

Metodi didattici

Lessons and (mainly guided) practical exercises.

As concerns the teaching methods of this course unit, all students must attend Module 1, 2 on Health and Safety online.

Modalità di verifica e valutazione dell'apprendimento

The exam consists of an oral examination on all the covered topics.

Strumenti a supporto della didattica

Cloud/big data services are accessed through Amazon Web Services and/or Google Cloud Platform via coupons.

Orario di ricevimento

Consulta il sito web di Matteo Francia

Consulta il sito web di Enrico Gallinucci

SDGs

Istruzione di qualità Imprese innovazione e infrastrutture

L'insegnamento contribuisce al perseguimento degli Obiettivi di Sviluppo Sostenibile dell'Agenda 2030 dell'ONU.