- Docente: Matteo Francia
- Credits: 6
- SSD: ING-INF/05
- Language: English
- Moduli: Enrico Gallinucci (Modulo 1) Matteo Francia (Modulo 2)
- Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2)
- Campus: Cesena
- Corso: Second cycle degree programme (LM) in Digital Transformation Management (cod. 5815)
-
from Sep 19, 2023 to Oct 25, 2023
-
from Nov 14, 2023 to Dec 13, 2023
Learning outcomes
At the end of the course, the student: - Knows the applications of Big Data technologies and the respective challenges - Knows the hardware and software architectures proposed to handle Big Data - Knows the techniques to store the data and the fundamentals aspect of new generation database systems - Knows the programming paradigms generally adopted in this kind of systems and the main analysis methodologies (batch, interactive, streaming) - Learns the design patterns that regulate the deployment in the Cloud of complex ICT solutions - Learns some of the most relevant components of the Cloud Platforms, with a specific focus on those services that enable Big Data management and IoT applications - Is able to make decisions concerning the appropriate Cloud Platform and the related services to be adopted - Knows the billing models that lay behind Cloud Computing services and learns how to estimate the cost of a specific solution, to support project management, to prepare quotations, or to support the management control system - Acquires practical expertise through laboratory activities in using some of the main open-source Big Data software tools, as well as some of the most adopted Cloud Computing services available on the market
Course contents
Big Data Architectures and Paradigms
- Hardware infrastructures and software architectures
- Data storage in distributed file systems and NoSQL databases
- The MapReduce programming paradigm
- Main principles of application design and optimization based on Apache Spark
- Architectures and algorithms to handle streams of data
Handling Big Data in the Cloud
- Introduction to data platforms: shifting from databases to well-integrated data ecosystems
- Definition of cloud and taxonomy of cloud services
- Introduction to the most relevant Cloud Platforms, with a specific focus on those services that enable data platforms and IoT applications
- Introduction to the billing models that lay behind Cloud Computing services. Cluster migration Cluster on-premises vs in the cloud
- Deploy real case studies on a cloud provider
Seminars by companies working with cloud and big data platforms
Readings/Bibliography
- Slides
Recommended readings:
- Ian Foster, Dennis Gannon. Cloud Computing for Science and Engineering. MIT Press, 2017
- Zburivsky, Danil, and Lynda Partner. Designing Cloud Data Platforms. Simon and Schuster, 2021.
- Tom White. Hadoop - The Definitive Guide (4th edition). O'Reilly, 2015
- Matei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell. Learning Spark, 2nd Edition. O'Reilly, 2020
- Andrew G. Psaltis. Streaming Data - Understanding the real-time pipeline. Manning, 2017
Further readings will be mentioned during the course.
Teaching methods
Lessons and (mainly guided) practical exercises.
As concerns the teaching methods of this course unit, all students must attend Module 1, 2 on Health and Safety online.
Assessment methods
The exam consists of an oral examination on all the covered topics.
Teaching tools
Cloud/big data services are accessed through Amazon Web Services and/or Google Cloud Platform via coupons.
Office hours
See the website of Matteo Francia
See the website of Enrico Gallinucci
SDGs
This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.