- Docente: Luis Alberto Barron Cedeno
- Credits: 3
- SSD: L-LIN/02
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Forli
-
Corso:
Second cycle degree programme (LM) in
Specialized translation (cod. 9174)
Also valid for Second cycle degree programme (LM) in Specialized translation (cod. 9174)
-
from Oct 09, 2024 to Dec 11, 2024
Learning outcomes
The student knows the main formats for data annotation, and the main strategies to source annotation from experts and/or crowd-sourcing platforms; s/he is able define a complex problem in natural language processing settings, to identify the appropriate data that has to be compiled in order to address it and to implement solutions that go beyond supervised ones; s/he is also familiar with relevant topics in NLP and artificial intelligence in general, including ethical aspects and process upscaling.
Course contents
Creation of datasets for NLP
This module intends to prepare the student to face the problem of producing a (supervised) dataset from scratch in order to have materials to learn a model.
The modules are.
- Definition of the problem, annotation scheme and guidelines.
- Collection of the instances (documents, sentences, social media posts)
- Annotation by experts.
- Annotation by crowdsourcing (platforms, inter-annotator agreement, consolidation).
- Ethical aspects of tasks, annotation and crowdsourcing
- Prompting
Assessment methods
Grading scale
- 30-30L: The student possesses an in-depth knowledge of the topic, an outstanding ability to apply the concepts. The student carries out rigorous formal experiments and produces an outstanding report, enough to be considered for submission to a national conference in the field.
- 27–29: The student possesses an in-depth knowledge of the topic, a sound ability to apply concepts, and good analytical skills. The student carries out good formal experiments and produces a high-quality report.
- 24-26: The candidate possesses a fair knowledge of the topic and a reasonable ability to apply concepts correctly. The student carries out some reasonable experiments and produces a good report.
- 21-23: The candidate possesses an adequate, but not in-depth, knowledge of the topic and a partial ability to apply concepts. The student carries out faulty experiments and produces a reasonable report.
- 18-20: The candidate possesses a barely adequate and only superficial knowledge of topic and only an inconsistent ability to apply concepts. The student carries out wrong experiments and produces a defficient report.
- < 18 Fail: The candidate possesses an inadequate knowledge of the topic, makes significant errors in applying concepts. Both experiments and report are poor.
Office hours
See the website of Luis Alberto Barron Cedeno