95630 - Big Data and Cloud Platforms

Academic Year 2022/2023

  • Docente: Matteo Francia
  • Credits: 6
  • SSD: ING-INF/05
  • Language: English
  • Moduli: Enrico Gallinucci (Modulo 1) Matteo Francia (Modulo 2)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
  • Campus: Cesena
  • Corso: Second cycle degree programme (LM) in Digital Transformation Management (cod. 5815)

Learning outcomes

At the end of the course, the student: - Knows the applications of Big Data technologies and the respective challenges - Knows the hardware and software architectures proposed to handle Big Data - Knows the techniques to store the data and the fundamentals aspect of new generation database systems - Knows the programming paradigms generally adopted in this kind of systems and the main analysis methodologies (batch, interactive, streaming) - Learns the design patterns that regulate the deployment in the Cloud of complex ICT solutions - Learns some of the most relevant components of the Cloud Platforms, with a specific focus on those services that enable Big Data management and IoT applications - Is able to make decisions concerning the appropriate Cloud Platform and the related services to be adopted - Knows the billing models that lay behind Cloud Computing services and learns how to estimate the cost of a specific solution, to support project management, to prepare quotations, or to support the management control system - Acquires practical expertise through laboratory activities in using some of the main open-source Big Data software tools, as well as some of the most adopted Cloud Computing services available on the market

Course contents

Big Data Architectures and Paradigms

  • Hardware infrastructures and software architectures
  • Data storage in distributed file systems and NoSQL databases
  • The MapReduce programming paradigm
  • Main principles of application design and optimization based on Apache Spark
  • Architectures and algorithms to handle streams of data

Handling Big Data in the Cloud

  • Introduction to data platforms: shifting from databases to well-integrated data ecosystems
  • Definition of cloud and taxonomy of cloud services
  • Introduction to the most relevant Cloud Platforms, with a specific focus on those services that enable data platforms and IoT applications
  • Introduction to the billing models that lay behind Cloud Computing services. Cluster migration Cluster on-premises vs in the cloud
  • Deploy real case studies on a cloud provider

Seminars by companies working with cloud and big data platforms

Readings/Bibliography

  • Slides

Recommended readings:

  • Ian Foster, Dennis Gannon. Cloud Computing for Science and Engineering. MIT Press, 2017
  • Zburivsky, Danil, and Lynda Partner. Designing Cloud Data Platforms. Simon and Schuster, 2021.
  • Tom White. Hadoop - The Definitive Guide (4th edition). O'Reilly, 2015
  • Matei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell. Learning Spark, 2nd Edition. O'Reilly, 2020
  • Andrew G. Psaltis. Streaming Data - Understanding the real-time pipeline. Manning, 2017

Further readings will be mentioned during the course.

Teaching methods

Lessons and (mainly guided) practical exercises.

As concerns the teaching methods of this course unit, all students must attend Module 1, 2 on Health and Safety online.

Assessment methods

The exam consists of an oral examination on all the covered topics.

Teaching tools

Cloud/big data services are accessed through Amazon Web Services and/or Google Cloud Platform via coupons.

Office hours

See the website of Matteo Francia

See the website of Enrico Gallinucci

SDGs

Quality education Industry, innovation and infrastructure

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.