94009 - Language Technology Seminar

Academic Year 2023/2024

Learning outcomes

The goal of this Seminar is to illustrate the main IT tools used to collect and analyze linguistic data, both in academics and in the language industry (cognitive computing, text analytics, computational lexicography, translation, etc.). The activities will focus on the process of building and searching digital linguistic resources such as corpora and databases.

Course contents

The seminar will focus on the main tasks and tools of natural language processing (NLP), in particular on the crawling, processing and annotation of textual linguistic data. Particular attention will be devoted to web crawling methodologies, text pre-processing and mark-up protocols, and will introduce some preliminary notions on the main methods for evaluating the results of NLP systems.

Readings/Bibliography

Background readings

  • Lenci, Alessandro, Simonetta Montemagni & Vito Pirrelli. 2016. Testo e computer. Elementi di linguistica computazionale. Roma: Carocci.
  • Nissim, Malvina & Ludovica Pannitto. 2022. Che cos’è la linguistica computazionale. Roma: Carocci.

Teaching methods

The course will be divided into two parts. The first part will consist of lectures introducing the basic tools, notions and methodologies of NLP, as well as an introduction to the main programming languages used in computational linguistics (Bash, Python). The second part of the course will be more practical and lab-based, and will apply the acquired notions on specific tasks.

The lab will be carried out partly in the classroom, with the tutor's assistance, partly at home independently

Assessment methods

The assessment consists in the delivery of a series of practical tasks that will be assigned during the course. The last (more substantial) task that will be assigned at the end of the course must be handed in to the lecturer and tutor at least 10 days before the exam date via the Virtuale platform.

Teaching tools

Slides will be projected to support lectures and laboratories. Some web-based computer tools and digital resources for linguistic analysis will be illustrated; however, for some activities you will need to download specific (free) programmes. Google Colab notebooks will be used for practical exercises, for which participants will be required to have a Google account (free). The Virtuale platform will be used to make teaching materials available and to support the labs.

Office hours

See the website of Francesca Masini

SDGs

Quality education

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.