93236 - Language Technology Seminar (LM)

Academic Year 2022/2023

Learning outcomes

At the end of the Seminar, the students will have acquired specific skills and knowledge pertaining to the digital treatment of linguistic data which can be exploited both in academic research and in the language industry. The students will be able to deal with topics and issues in corpus linguistics, computational linguistics and digital humanities.

Course contents

The seminar will focus on the main tasks and tools of natural language processing (NLP), in particular on the crawling, processing and annotation of textual linguistic data. Particular attention will be devoted to web crawling methodologies, text pre-processing and mark-up protocols, and will introduce some preliminary notions on the main methods for evaluating the results of NLP systems.

Readings/Bibliography

Background readings

  • Lenci, Alessandro, Simonetta Montemagni & Vito Pirrelli. 2016. Testo e computer. Elementi di linguistica computazionale. Roma: Carocci.
  • Nissim, Malvina & Ludovica Pannitto. 2022. Che cos’è la linguistica computazionale. Roma: Carocci.

Teaching methods

The course will be divided into two parts. The first part will consist of lectures introducing the basic tools, notions and methodologies of NLP, as well as an introduction to the main programming languages used in computational linguistics (Bash, Python). The second part of the course will be more practical and lab-based, and will apply the acquired notions on specific tasks.

The lab will be carried out partly in the classroom, with the tutor's assistance, partly at home independently

Assessment methods

The assessment consists in the delivery of a series of practical tasks that will be assigned during the course. The last (more substantial) task that will be assigned at the end of the course must be handed in to the lecturer and tutor at least 10 days before the exam date via the Virtuale platform.

Teaching tools

Slides will be projected to support lectures and laboratories. Some web-based computer tools and digital resources for linguistic analysis will be illustrated; however, for some activities you will need to download specific (free) programmes. Google Colab notebooks will be used for practical exercises, for which participants will be required to have a Google account (free). The Virtuale platform will be used to make teaching materials available and to support the labs.

Office hours

See the website of Francesca Masini

SDGs

Quality education

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.