B1873 - Machine Learning for Humanities (1) (LM)

Academic Year 2024/2025

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Digital Humanities and Digital Knowledge (cod. 9224)

Learning outcomes

At the end of the course, the studentis familiar with the theoreticalprinciplesunderpinning modernmachine learning. The student isfurtherable to understand, apply and evaluatethe main machinelearningtechniquesand implementationsrelevantto addressingpracticalproblems and tasksin thedomains ofCultural Heritageand GLAM.Lastly, the student is able to critically reflect on thepreconditions and implications of using machine learning in these domains.

Course contents

This course offers an introduction to Machine Learning (ML), with a focus on applications in the Arts and Humanities. The course will introduce foundational ML concepts and methods, and will be complemented by laboratory activities where methods will be implemented.

After completing this course, the student can:

  • Understand basic and advanced Machine Learning concepts and methods.

  • Find software libraries that can be used to develop ML applications.

  • Implement ML applications using Python.

  • Evaluate whether ML could be used in an Arts and Humanities task.

Course contents

Machine Learning is increasingly used in the context of Arts & Humanities research and GLAM applications (Galleries, Libraries, Archives, Museums). Examples range from text recognition and information extraction from historical sources to image search and analysis on artwork collections, from automatic 3D reconstructions of built heritage to the automatic detection of archeological sites from satellite or drone images. This course will lay the foundations for the students to explore and implement similar applications and more.

The breakdown of the topics is as follows (per week):

  1. Week 1: Introduction to Machine Learning, part 1. We discuss the course setup, the fundamentals of machine learning, the types of ML tasks, the key components of an ML workflow, some foundational mathematical concepts, and linear regression. We implement linear regression in numpy.

  2. Week 2: Introduction to Machine Learning, part 2. We discuss the worked-out examples of linear regression, linear classification, and the Multi-Layer Perceptron (MLP), and implement them in numpy.

  3. Week 3: Pytorch. We introduce Pytorch, implementing linear regression, linear classification, and the MLP. We mention Sklearn and Huggingface. We mention specialized architectures for textual (recurrent network) and visual (convolutional networks) tasks.

  4. Week 4: The Transformer. We introduce and work with the transformer architecture, which underpins most modern ML applications.

  5. Week 5: Generative AI. We discuss generative AI models, in particular focusing on Large Language Models (LLMs). We discuss how to use LLMs and mention more advanced applications (e.g., Retrieval Augmented Generation).

Note that this list of topics is tentative and might still change slightly.

Readings/Bibliography

The materials will be provided via GitHub. The students are expected to take their own notes.

Readings

The following book serves as a reference, more will be provided during the course:

  • Zhang et al., Dive Into Deep Learning, MIT Press, 2023. https://d2l.ai/index.html .

  • Allamar, Jay, The Illustrated Transformer. https://jalammar.github.io/illustrated-transformer .

  • Riedl, Mark. A Very Gentle Introduction to LLMs. https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e

Teaching methods

Lectures and live coding sessions. Attending students are expected to come prepared to class.

Assessment methods

Individual oral exam on all course contents (100%). The student may, optionally, work on a short essay (theoretical, state of the art) or personal project (applied), further exploring an application area that we discussed in class or another one of their choice. The essay/project has to be sent to the lecturer at least 5 days in advance of the oral exam date. The essay/project will, in this case, contribute to 50% of the final grade, and the oral exam for the remaining 50%. Essay/project guidelines will be provided at the beginning of the seminar. Students are encouraged to do the essay/project as this will allow them to explore a topic of choice and lighten their oral examination.

The approach for non-attending students is the same.

Recommended prior knowledge

You need to know how to code in Python. High-school algebra and calculus are also expected (you can refresh them using the first reading below). The students will also benefit from having attended 1st-year courses such as ‘Computational Thinking’.

Teaching tools

Slides, live coding, demonstrations, readings, and seminar discussions.

Classes are held in a classroom equipped with personal computers connected to the Internet.

Office hours

See the website of Giovanni Colavizza