Academic Year 2023/2024

  • Teaching Mode: Traditional lectures
  • Campus: Forli
  • Corso: Second cycle degree programme (LM) in Specialized translation (cod. 9174)

Learning outcomes

The student is able to independently identify a research problem of relevance to a linguistics-, translation- and/or technology-related field; s/he is able to locate and efficiently use the tools and information sources needed to tackle a research problem in one or more areas of expertise; s/he is able to acquire further competences related to linguistics, translation and technology, as well as other disciplines of relevance to her/his studies, through interaction with scholars from a range of fields

Course contents

Creation of datasets for NLP

This module intends to prepare the student to face the problem of producing a (supervised) dataset from scratch in order to have materials to learn a model.

The modules are.

  1. Definition of the problem, annotation scheme and guidelines.
  2. Collection of the instances (documents, sentences, social media posts)
  3. Annotation by experts.
  4. Annotation by crowdsourcing (platforms, inter-annotator agreement, consolidation).
  5. Ethical aspects of tasks, annotation and crowdsourcing
  6. (time allowing) Learning a model on your data: transformers applied on hate speech/misogyny/toxicity.
  7. (time allowing) Error analysis.
  8. Reporting the results (paper preparation).

Assessment methods

Grading scale

  • 30-30L: The student possesses an in-depth knowledge of the topic, an outstanding ability to apply the concepts. The student carries out rigorous formal experiments and produces an outstanding report, enough to be considered for submission to a national conference in the field.
  • 27–29: The student possesses an in-depth knowledge of the topic, a sound ability to apply concepts, and good analytical skills. The student carries out good formal experiments and produces a high-quality report.
  • 24-26: The candidate possesses a fair knowledge of the topic and a reasonable ability to apply concepts correctly. The student carries out some reasonable experiments and produces a good report.
  • 21-23: The candidate possesses an adequate, but not in-depth, knowledge of the topic and a partial ability to apply concepts. The student carries out faulty experiments and produces a reasonable report.
  • 18-20: The candidate possesses a barely adequate and only superficial knowledge of topic and only an inconsistent ability to apply concepts. The student carries out wrong experiments and produces a defficient report.
  • < 18 Fail: The candidate possesses an inadequate knowledge of the topic, makes significant errors in applying concepts. Both experiments and report are poor.

Office hours

See the website of Luis Alberto Barron Cedeno