You are here:

B1873 - Machine Learning for Humanities (1) (LM)

Academic Year 2024/2025

                
                        Docente:
                        Giovanni Colavizza
                    
                        Credits:
                        6
                    
                        SSD:
                        INF/01
                    
                        Language:
                        English
                    
                        Teaching Mode:
                        Traditional lectures
                        
                            Campus:
                            Bologna
                        
                            Corso:
                            Second cycle degree programme (LM) in
                            Digital Humanities and Digital Knowledge (cod. 9224)

                            Teaching resources on Virtuale
                        
                                    Course Timetable
                                
from Sep 16, 2024 to Oct 22, 2024

Learning outcomes

At the end of the course, the studentis familiar with the theoreticalprinciplesunderpinning modernmachine learning. The student isfurtherable to understand, apply and evaluatethe main machinelearningtechniquesand implementationsrelevantto addressingpracticalproblems and tasksin thedomains ofCultural Heritageand GLAM.Lastly, the student is able to critically reflect on thepreconditions and implications of using machine learning in these domains.

Course contents

This course offers an introduction to Machine Learning (ML), with a focus on applications in the Arts and Humanities. The course will introduce foundational ML concepts and methods, and will be complemented by laboratory activities where methods will be implemented.

After completing this course, the student can:

Understand basic and advanced Machine Learning concepts and methods.
Find software libraries that can be used to develop ML applications.
Implement ML applications using Python.
Evaluate whether ML could be used in an Arts and Humanities task.

Course contents

Machine Learning (ML) is increasingly used in the context of Arts & Humanities research and GLAM applications (Galleries, Libraries, Archives, Museums). Examples range from text recognition and information extraction from historical sources to image search and analysis on artwork collections, from automatic 3D reconstructions of built heritage to the automatic detection of archeological sites from satellite or drone images. This course will lay the foundations for the students to explore and implement similar applications and more.

The breakdown of the topics is as follows (per week):

Week 1: Introduction to Machine Learning, part 1. We discuss the course setup, the fundamentals of machine learning, the types of ML tasks, the key components of an ML workflow, some foundational mathematical concepts, and linear regression. We implement linear regression in numpy.
Week 2: Introduction to Machine Learning, part 2. We discuss the worked-out examples of linear regression, linear classification, and the Multi-Layer Perceptron (MLP), and implement them in numpy.
Week 3: PyTorch and Machine Vision. We introduce PyTorch, discuss tasks in machine vision and the main architectures to work with images (convolutional and residual networks). We implement an image-based task in PyTorch.
Week 4: Language Processing. We introduce tasks in natural language processing and the Transformer, the main architecture to work with texts. We implement text-based tasks in PyTorch.
Week 5: Generative AI. We discuss generative AI models, in particular focusing on Large Language Models (LLMs). We see how to use LLMs in practice and mention advanced applications such as Retrieval Augmented Generation (RAG). We implement a RAG-enabled chatbot using LlamaIndex and Chainlit.

Note that this list of topics is tentative and might still change slightly.

Readings/Bibliography

The materials will be provided via GitHub. The students are expected to take their own notes.

Readings

BOOK: Zhang et al., Dive Into Deep Learning, MIT Press, 2023. https://d2l.ai/index.html.

Readings:

Allamar, Jay, The Illustrated Transformer, 2018. https://jalammar.github.io/illustrated-transformer.
Riedl, Mark. A Very Gentle Introduction to LLMs, 2023. https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e.
Open AI. GPT-4, 2023. https://openai.com/index/gpt-4-research.
At least one among:
- Colavizza, Giovanni, Tobias Blanke, Charles Jeurgens, and Julia Noordegraaf. “Archives and AI: An Overview of Current Debates and Future Perspectives.” Journal on Computing and Cultural Heritage 15, no. 1 (February 28, 2022): 1–15. https://doi.org/10.1145/3479010.
- Fiorucci, Marco, Marina Khoroshiltseva, Massimiliano Pontil, Arianna Traviglia, Alessio Del Bue, and Stuart James. “Machine Learning for Cultural Heritage: A Survey.” Pattern Recognition Letters 133 (May 2020): 102–8. https://doi.org/10.1016/j.patrec.2020.02.017.
- Lombardi, Francesco, and Simone Marinai. “Deep Learning for Historical Document Analysis and Recognition—A Survey.” Journal of Imaging 6, no. 10 (October 16, 2020): 110. https://doi.org/10.3390/jimaging6100110.
- Santos, Iria, Luz Castro, Nereida Rodriguez-Fernandez, Álvaro Torrente-Patiño, and Adrián Carballal. “Artificial Neural Networks and Deep Learning in the Visual Arts: A Review.” Neural Computing and Applications 33, no. 1 (January 2021): 121–57. https://doi.org/10.1007/s00521-020-05565-4.
- Sommerschield, Thea, Yannis Assael, John Pavlopoulos, Vanessa Stefanak, Andrew Senior, Chris Dyer, John Bodel, Jonathan Prag, Ion Androutsopoulos, and Nando De Freitas. “Machine Learning for Ancient Languages: A Survey.” Computational Linguistics 49, no. 3 (September 1, 2023): 703–47. https://doi.org/10.1162/coli_a_00481.
- Wevers, Melvin, and Thomas Smits. “The Visual Digital Turn: Using Neural Networks to Study Historical Images.” Digital Scholarship in the Humanities, January 18, 2019. https://doi.org/10.1093/llc/fqy085.

Visual Neural Networks course by 3blue1brown: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi.

Further resources: Awesome AI 4 LAM. https://ai4lam.github.io/awesome-ai4lam.

Teaching methods

Lectures and live coding sessions. Attending students are expected to come prepared to class.

Assessment methods

Oral exam on all course contents (50%) and individual project (50%). The student may select a topic for the project in agreement with the lecturer. The project must be applied, i.e., entailing a substantial component of writing of code and the development of an ML application of choice. The project has to be sent to the lecturer at least 5 days in advance of the oral exam date. The project must be submitted before taking the exam. Project guidelines will be provided at the beginning of the course.

The program for non-attending students is the same.

Recommended prior knowledge

You need to know how to code in Python. High-school algebra and calculus are also expected (you can refresh them using the first reading below). The students will also benefit from having attended 1st-year courses such as ‘Computational Thinking’.

Teaching tools

Slides, live coding, demonstrations, readings, and seminar discussions.

Classes are held in a classroom equipped with personal computers connected to the Internet.

Links to further information

https://github.com/Giovanni1085/UNIBO_MachineLearning

Office hours

See the website of Giovanni Colavizza