84883 - Terminology and Information Mining (CL2)

Course Unit Page

Academic Year 2018/2019

Learning outcomes

The student - knows and is able to use effectively the main resources for terminology management and information mining, be they paper-based or in digital form; has terminological and terminographic knowledge of one or more sub-languages and domains, as required for translation purposes - is able to devise, manage and evaluate complex terminography projects, involving several professionals and a variety of skills and competences, in a way that is consistent with professional ethics - is able to acquire higher-level knowledge and competences in the areas of terminology/terminography and information mining independently, and to apply them to novel fields

Course contents

The "Terminology and Information Mining" (TerMine) module is one of the three modules that make up the "Translation Technology and methods" course, together with "Computer-Assisted Translation and Web Localization" (CatLoc) and "Machine Translation and Post-editing" (MatPed); the latter two are held by Prof. Claudia Lecci.

The TerMine module itself has two components.

The first component (first 5 weeks) presents the main online and offline tools and methods to retrieve and process information both for specialised translation and revision tasks, and specifically for terminological and terminographic applications. Following an overview of advanced techniques for web searching using search engines, special emphasis will be placed on the construction and use of electronic corpora, both specialised and general-purpose ones, and on advanced methods for text manipulation and corpus consultation (regular expressions, Corpus Query Language).

The second component (last 5 weeks) focuses on the theoretical bases of terminology, as well as the main methods for the retrieval of terminology in specialized domains and its systematization into conceptual systems. The main terminographic tools for the creation and management of simple and complex terminology databases are then presented. Special attention is paid to methods and resources which ensure complete interoperability with the workflows and tools presented during the CatLoc and MatPed modules.


Compulsory reading

Cabré, M. T. (1999). Terminology. Theory, methods and applications. Amsterdam and Philadelphia: John Benjamins.

Suggested reading

For the Information Mining component:

- McEnery, T., Xiao, R., and Tono, Y. (2006). Corpus-based language studies. An advanced resource book. London and New York: Routledge.

- Pym, A., Perekrestenko, A. and Starink, B. (eds.) (2006). Translation technology and its teaching. Tarragona: Intercultural Studies Group. Online [http://www.intercultural.urv.cat/media/upload/domain_317/arxius/Technology/translationtechnology.pdf] .

- Sinclair, J. M. (1996). “The search for units of meaning”. Textus 9(1): 75–106.

For the "Terminology" component:

- Faini, P. (2014) Terminology Management and the Translator. From Project Planning to Database Creation. Trento: Tangram.

- Kockaert, H. J. and Steurs, F. (eds.) (2015) Handbook of Terminology. Amsterdam and Philadelphia: John Benjamins.

- Magris M., Musacchio M. T., Rega L. and Scarpa F. (2002). Manuale di terminologia, Aspetti teorici, metodologici e applicativi. Milano: Hoepli.

Teaching methods

Lessons are delivered in the form of workshops, and combine theoretical contents and a strong practical and applied component.

Theoretical contents are delivered through presentations by the lecturer, and their acquisition is tested by means of in-class discussion (as well as in the final examination).

The applied part consists of hands-on practice in the lab and homework exercises. These are discussed during troubleshooting sessions in the following class. Students will be asked to hand in, either individually or in small groups, two assignments in preparation for the exam; these will focus respectively on the information mining and terminology component of the module, and the lecturer will provide detailed feedback on them. These activities are aimed at constantly monitoring progress in the development of the technological skills that make the object of the course.

Students are expected to attend at least 70% of the module classes.

Assessment methods

For their final exam, students will hand in an individual information mining and terminology project, and will discuss it during an oral exam which also tests their knowledge of the technological tools and theoretical principles presented during the course.

The project, which will have to be handed in via email or Moodleno later than a week before every "appello", will consist of: a) a pool of bilingual specialised corpora built both manually and semi-automatically, accompanied by a brief report on the corpus construction techniques adopted (readme file); b) a terminological database (in tab-separated + SDL MultiTerm-compliant formats) and the matching conceptual systems, through which the typical terminology of the domain represented in the corpus is systematized.

The oral exam will consist in:

1. the oral presentation of the project. This is worth 15 points. Evaluation criteria: originality in the choice of the domain for terminological inquiry; thoroughness in the sampling of relevant texts for corpus construction; clarity in the systematization of terminology (conceptual systems); formal correctness of the terminological databases.

2. One question focusing on theoretical or methodological aspects of information mining and terminology. This is worth 5 points. Evaluation criteria: relevance of the answer with respect to the question; ability to present the key concepts of the discipline.

3. Two practical exercises using the pieces of software presented in class. This is worth 10 points. Evaluation criteria: ability to adapt techniques to search for linguistic information and/or to manipulate texts to new domains.

The mark of the TerMine module will result from the sum of the points obtained in the three parts of the exam.

The final mark of the "Translation Technology and methods" course will be calculated as the arithmetic mean of the marks obtained in the TerMine, CatLoc and MatPed modules.

Teaching tools

Lessons are held in a computer lab with internet connection and beamer.

 Since lessons take the form of workshops, with substantial time devoted to pratical hands-on exercises, students have the possibility to become acquainted with the main software programs used in the fields of information mining and terminology, both proprietary and open-source/free.

Support materials (sample texts, slides, project files, instructions etc.) are made available through the Moodle e-learning platform.

Links to further information


Office hours

See the website of Adriano Ferraresi