69706 - Linguistic Computer Science (1) (LM)

Course Unit Page

SDGs

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education Partnerships for the goals

Academic Year 2019/2020

Learning outcomes

At the and of this course the student will be able to handle advanced tasks in corpus management and will be able to perform relevant statistical analysis on linguistic data.

Course contents

Advanced techniques for corpus managment

  • Corpus linguistics.
  • Tokenisation and sentence splitting.
  • Regular expressions.
  • Methods for text annotation.
    • XML and TEI
  • Multimodal annotations: annotation graph.
  • Case studies:
    • Written ans spoken corpora (Italian/English): a review.
    • Corpora@FICLIT: CORIS/CODIS, BoLC and DiaCORIS.
  • Construction of a small annotated corpus in TEI format.
Statistical analysis of linguistic data.
  • On the importance of quantitative analysis for linguistics.
  • Fundamentals of statistic package R.
  • Descriptive statistics.
  • Analytical/Inferential statistics.

Readings/Bibliography

Some chapters extracted from:
- Lenci, A., Montemagni, S. and Pirrelli, V. (2005). Testo e computer. Carocci.
- Gries, S. (2009). Statistics for Linguistics with R. De Gruyter.
Slides, handouts and papers downloadable from the course web site http://corpora.ficlit.unibo.it/LingInfLM/ .


Students not able to attend the lessons
are strongly invited to contact the teacher to get some explanations and avoid any misunderstanding about the course contents and reading materials.

Teaching methods

Face-to-face classes and laboratory sessions for 30 hours.

Assessment methods

The student has to solve three exercises given by the teacher and has to produce a report showing the proposed solutions. The exam consists of an oral colloquium on the course contents and on the student report designed to evaluate the critical skills and methodological knowledge gained by the student.

Reaching a clear view of all the course topics as well as using a correct language terminology will be valued with maximum rankings.
Mnemonic knowledge of the course topics or not completely appropriate terminology will be valued with intermediate rankings.
Unknown topics or inappropriate terminology use will be valued, depending on the seriousness of the omissions, with minimal or insufficient rankings.

It is compulsory to register for the exam using the online [https://almaesami.unibo.it/almaesami/welcome.htm] procedure.

Teaching tools

The course web site is the central point for any kind of information about the course. It contains the handouts and the readings discussed during the lessons as well as a rich software repository useful for laboratory practice.

A USB key downloadable from the course website has been prepared for the students containing a complete computing environment to practice with the procedures proposed during the course. This tool will be used also in the laboratory sessions.

Links to further information

http://corpora.ficlit.unibo.it/LingInfLM/

Office hours

See the website of Fabio Tamburini