82023 - Information Mining and Terminology (CL1)

Course Unit Page


This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education Partnerships for the goals

Academic Year 2021/2022

Learning outcomes

The student knows and is able to use effectively the main resources for information mining and terminology management, be they paper-based or in digital form; has terminological and terminographic knowledge of one or more sub-languages and domains, as required for translation purpose; s/he is able to devise, manage and evaluate complex terminography projects, involving several professionals and a variety of skills and competences, in a way that is consistent with professional ethics; s/he is able to acquire higher-level knowledge and competences in the areas of terminology/terminography and information mining autonomously, and to apply them to novel fields.

Course contents

The "Information Mining and Terminology" (InMiTe) module is one of the two modules that make up the "Technologies for Translation" course, together with "Computer-Assisted Translation and Post-editing" (CatPed); the latter is held by Prof. Claudia Lecci.

The InMiTe module is split, in turn, into two sub-modules, the first one of which ("Information mining") is held by Prof. Adriano Ferraresi, while the second one is held by Prof. Christian Olalla Soler.

The "Information Mining" sub-module, which will be held during the first 5 weeks of the semester, presents the main online and offline tools and methods to retrieve and process information both for specialised translation and revision tasks, and for terminological and terminographic applications. Specifically, the following contents will be convered:

  • Advanced techniques for web searching using online search engines;
  • Construction of specialised electronic corpora, both adopting manual methods and semi-automatic ones;
  • Methods to retrieve terminological information in specialised corpora (concordances, clusters, collocates, frequency lists, keywords);
  • Advanced methods for consultation of online reference corpora (Corpus Query Language).

The "Terminology" sub-module, which will be held during the last 5 weeks of the semester, presents the main concepts and methods of terminology and terminography which will allow the students to carry out their terminology project. The following contents will be covered:

  • Core concepts of terminology: general vs. specialized language; the concept of term; term relationships; neologisms; the link between culture, metaphors and language; socioterminology.
  • Methods to extract terminological units.
  • Methods to systematize and visualize relationships among terminological units.
  • Methods to plan, structure and feed terminology databases and glossaries with a special focus on OmegaT and SDL MultiTerm.


Suggested readings

  • Cabré Castellví, M. T. (1999). Terminology. Theory, methods and applications. John Benjamins.
  • Cabré Castellví, M. T. (2010). "Terminology and Translation" In: Gambier, Yves & Luc van Doorslaer (eds.), Handbook of Translation Studies. Volume 1. John Benjamins.
  • Crawford, W. and Csomay, E. 2016. Doing corpus linguistics. Routledge.
  • Melby, A. K. (2012). "Terminology in the age of multilingual corpora". Jostrans - The Journal of Specialised Translation 18.
  • Sinclair, J. M. (1996). “The search for units of meaning”. Textus 9(1): 75–106.
  • Zanettin, F. (2012). Translation-Driven Corpora: Corpus Resources for Descriptive and Applied Translation Studies. Routledge.

Teaching methods

Lessons are delivered in the form of lectures and workshops, and combine theoretical contents and a strong practical and applied component.

Theoretical contents are delivered through presentations by the lecturer, and their acquisition is tested by means of in-class discussion, as well as in the final examination.

The applied part consists of hands-on practice in the lab and homework exercises. These are discussed during troubleshooting sessions in the following class. Students will be asked to hand in, either individually or in small groups, some of these exercises in preparation for the exam; the lecturer will provide detailed feedback on them. The activities are aimed at constantly monitoring progress in the development of the technological skills that make the object of the course.

Students are expected to attend at least 70% of the module classes.

All students must attend Module 1 and 2 on Health and Safety online.

Assessment methods

The "Information Mining and Terminology" module will be assessed through a single final examination, which will be evaluated jointly by the instructors of the two sub-modules.

Students will hand in an individual information mining and terminology project, and will discuss it during an oral exam which also tests their knowledge of the technological tools and theoretical principles presented during the course.

The project, which will have to be handed in via email or Moodle a week before every "appello", will consist of: a) a pool of bilingual specialised corpora built both manually and semi-automatically, accompanied by a brief report on the corpus construction techniques adopted (readme file); b) a terminological database (in tab-separated + SDL MultiTerm-compliant formats) and the matching conceptual systems, through which the typical terminology of the domain represented in the corpus is systematized.

The assessment will focus on three areas:

1. Quality of the project; worth 20 points. Evaluation criteria: originality in the choice of the domain for terminological inquiry; thoroughness in the sampling of relevant texts for corpus construction; clarity in the systematization of terminology (conceptual systems); formal correctness of the terminological databases. Students will receive detailed feedback on their project before the oral exam, during which they will be invited to discuss their work and justify their choices.

2. Knowledge of the theoretical foundations of terminology/terminography; worth 3 points. Evaluation criteria: relevance of the answer with respect to the question; ability to present the key concepts of the discipline. Knowledge will be assessed through a question during the oral exam.

3. Practical skills concerning the use of the pieces of software presented in class; worth 7 points. Evaluation criteria: ability to apply techniques for linguistic and terminological information search to new domains. These skills will be tested through a practical exercise during the oral exam.

For each of these two areas, assessment is based on the following scale:

  • 100% of the score: excellent skills and knowledge with reference to the evaluation criteria;
  • 90% of the score: very good skills and knowledge with reference to the evaluation criteria;
  • 80% of the score: good skills and knowledge with reference to the evaluation criteria;
  • 70% of the score: adequate skills and knowledge with reference to the evaluation criteria;
  • 60% of the score: sufficient skills and knowledge with reference to the evaluation criteria;
  • <60% of the score: insufficient skills and knowledge with reference to the evaluation criteria.

The mark of the InMiTe module will result from the sum of the points obtained in the three parts of the exam.

The final mark of the "Technologies for Translation" course will be calculated as the arithmetic mean of the marks obtained in the InMiTe and CatPed modules.

Teaching tools

Both frontal and workshop-like lectures will be delivered in a computer lab equipped with a PC and an overhead projector, so as to guarantee that lessons can be attended in person or remotely, and that the lecturer can switch from one teaching mode to the other if need be. 

Frontal lectures will provide the necessary theoretical and methodological foundations of the discipline. These lectures will be followed by lessons in the form of workshops, during which substantial time will be devoted to practical hands-on exercises, focusing on the main software applications used in the field of information mining, both proprietary and open-source/free. Students will be able to carry out the exercises whether they are in the lab or are attending lessons online. 

Support materials (videos, sample texts, slides, project files, instructions etc.) are made available through the Moodle e-learning platform.

Links to further information


Office hours

See the website of Adriano Ferraresi

See the website of Christian Olalla Soler