B2712 - Language, Technology, Research II: Text Processing

Academic Year 2024/2025

  • Teaching Mode: Traditional lectures
  • Campus: Forli
  • Corso: First cycle degree programme (L) in Languages and Technologies for Intercultural Communication (cod. 5979)

Learning outcomes

The student knows the basic features (terms, context, methods) of language data processing. S/he is able to: manage different data formats, including plain and annotated text; build and analyse corpora of different types through a variety of corpus processing and query tools; apply the acquired knowledge to address language research problems.

Course contents

The course aims to provide an overview of the tools and methods for the analysis of texts and corpora (systematic collections of texts used for linguistic research purposes). The main topics covered are:

  • Principled selection of texts (sampling); comparability and representativeness.
  • Types of corpora: specialized and reference; monolingual and bilingual comparable, parallel.
  • Consulting plain text corpora: corpus-based and corpus-driven methods.
  • Morphosyntactic, structural, and contextual annotation; alignment.
  • Consulting  annotated corpora: corpus-based and corpus-driven methods.

Readings/Bibliography

Readings will be shared during the course.

The following are useful resources for reference purposes:

Crawford, W. and Csomay, E. 2016. Doing corpus linguistics. Oxford and New York: Routledge.

Egbert, J., Larsson, T. and Biber, D. 2020. Doing linguistics with a corpus. Cambridge: Cambridge University Press.

McEnery, T. and A. Hardie 2012. Corpus linguistics. Method, theory and practice. Cambridge: Cambridge University Press.

Mikhailov, M. and R. Cooper 2016. Corpus linguistics for translation and contrastive studies. Oxford and New York: Routledge.

Teaching methods

The course takes a seminar format, with direct access to tools for the construction and consultation of corpora. Students apply the knowledge acquired to build and use specialized corpora and to consult corpora freely available in the public domain.

The practical activities are structured around real-world problems that students solve by working independently and in groups. Peer support, combined with the instructor's guidance, create a supportive, learner-centered educational environment that fosters the development of relational skills and autonomy in problem-solving.

Assessment methods

Continuous assessment of learning takes place through observation and interaction in class, as well as through non-graded activities such as simple practical tasks and oral reports.

Summative assessment is based on three tests, two of which are carried out during the course (in groups), and one after its conclusion (individually):

  1. Construction of a specialized corpus and drafting of its documentation (group submission) [30%]
  2. Extraction of information from corpora (class presentation, in groups) [30%]
  3. Description of a hypothetical research project, including purpose/relevance, hypotheses/research questions, bibliographic references, available corpora or texts, types of information to be extracted, risk assessment (post-course submission, individual) [40%]

Students who do not complete all parts as indicated above take a final individual exam consisting of the submission of the first and third tasks and of an oral presentation related to the second task.

Teaching tools

The interactive, hands-on sessions, take place in a laboratory equipped with PCs and a video projector, allowing for the use of software for the construction and analysis of corpora (Intertext Editor, AntConc, NoSketch Engine). The instructor's presentations are made available via Moodle/Virtuale. Additionally, the educational materials produced within the UPSKILLS project, particularly those related to the course Processing texts and corpora, are made available for e-learning activities.

Considering the type of activities and the teaching methods adopted, attendance in this educational activity requires prior participation of all students in modules 1 and 2 of training on safety in study environments.

Office hours

See the website of Silvia Bernardini

SDGs

Quality education Gender equality Partnerships for the goals

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.