B5685 - Selected Topics in Natural Language Processing

Academic Year 2025/2026

  • Teaching Mode: Traditional lectures
  • Campus: Forli
  • Corso: Second cycle degree programme (LM) in Specialized translation (cod. 9174)

    Also valid for Second cycle degree programme (LM) in Specialized translation (cod. 9174)

Learning outcomes

The student knows the main formats for data annotation, and the main strategies to source annotation from experts and/or crowd-sourcing platforms; s/he is able define a complex problem in natural language processing settings, to identify the appropriate data that has to be compiled in order to address it and to implement solutions that go beyond supervised ones; s/he is also familiar with relevant topics in NLP and artificial intelligence in general, including ethical aspects and process upscaling.

Course contents

Creation of datasets for NLP

This module intends to prepare the student to face the problem of producing a (supervised) dataset from scratch in order to have materials to learn a model.

The modules are.

  1. Definition of the problem, annotation scheme and guidelines.
  2. Collection of the instances (documents, sentences, social media posts)
  3. Annotation by experts.
  4. Annotation by crowdsourcing (platforms, inter-annotator agreement, consolidation).
  5. Ethical aspects of tasks, annotation and crowdsourcing
  6. Prompting

Readings/Bibliography

  1. Sinclair, J. 2005. Developing linguistic corpora: a guide to good practice. Chapter 1: Corpus and Text — Basic Principles [https://users.ox.ac.uk/~martinw/dlc/chapter1.htm] . AHDS Literature, Language, and Lingustics.
  2. Surowiecki, J. 2004. The Wisdom of Crowds. Anchor

 

More topic-specific materials will be provided over the semester.

Teaching methods

The course will be a combination of seminar and practical sessions. In either case, active participation of the students will be expected.

Assessment methods

The student will work on the creation of a labelled corpus within his/her own research interests with the knowledge acquired during the course. Upon agreement of the topic with the instructor, the student will work on creating the corpus and will write a written report.

Alternatively, the instructor can involve the student in an ongoing annotation effort within the department.

Grading scale

  • 30-30L: The student possesses an in-depth knowledge of the topic, an outstanding ability to apply the concepts. The student carries out rigorous formal experiments and produces an outstanding report, enough to be considered for submission to a national conference in the field.
  • 27–29: The student possesses an in-depth knowledge of the topic, a sound ability to apply concepts, and good analytical skills. The student carries out good formal experiments and produces a high-quality report.
  • 24-26: The candidate possesses a fair knowledge of the topic and a reasonable ability to apply concepts correctly. The student carries out some reasonable experiments and produces a good report.
  • 21-23: The candidate possesses an adequate, but not in-depth, knowledge of the topic and a partial ability to apply concepts. The student carries out faulty experiments and produces a reasonable report.
  • 18-20: The candidate possesses a barely adequate and only superficial knowledge of topic and only an inconsistent ability to apply concepts. The student carries out wrong experiments and produces a defficient report.

Students with specific learning difficulties (SpLD) or with disabilities that can affect their ability to attend courses are invited to contact the University service for students with disabilities and SLD at the earliest opportunity -- ideally before the start of the course. The University service will suggest possible adjustments to the course work and/or exam, which must then be submitted to the course leader so they can assess their feasibility, in line with the learning objectives of the course. Please note that adjustments to the exam must be requested at least two weeks in advance.

  • < 18 Fail: The candidate possesses an inadequate knowledge of the topic, makes significant errors in applying concepts. Both experiments and report are poor.

Office hours

See the website of Luis Alberto Barron Cedeno