96831 - LANGUAGE DATA ANALYSIS

Course Unit Page

SDGs

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education

Academic Year 2021/2022

Learning outcomes

The student: knows the basic features (terms, concepts, methods and techniques) needed for quantitative analysis of language data; is able to prepare data for analysis, describe and visualise them; is able to formulate testable hypotheses and choose an appropropriate statistical test; is able to conduct frequently used statistical tests and interpret their results

Course contents

Within this course the students will learn about the role that data analysis plays in language research, and about the elements that quantitative data analysis is composed of (from formulating research questions and hypotheses to interpreting the results). Different types of language data will be introduced (coming from corpora and experimental research), as well as different file formats in which data can be stored. Particular attention will be given to describing and visualising language data, and to performing statistical tests on them. Frequently used tests will be explained (some examples: Chi-square test, correlation coefficient, t-tests), along with research designs they are appropriate for. The course has an extensive practical component, which will involve the use of Microsoft Excel (partially replaceable by LibreOffice Calc) and the R environment.

Readings/Bibliography

Lecture slides (to be made available on Virtuale)

Selected chapters from:

Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Desagulier, G. (2017). Corpus Linguistics and Statistics with R: Introduction to Quantitative Methods in Linguistics. Cham: Springer.

Field, A., J. Miles & Z. Field (2012). Discovering Statistics Using R. London: Sage.

Levshina, N. (2015). How to do Linguistics Using R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins.

Winter, B. (2020). Statistics for Linguists: An Introduction Using R. New York & London: Routledge.

(Details and possible additional and/or alternative readings will be provided on Virtuale during the course)

Teaching methods

A combination of lectures and practical exercises

***

As concerns the teaching methods of this course unit, all students are required to attend the online Modules 1 and 2 on Health and Safety.

Assessment methods

The course will be assessed through homework and short tests throughout the semester (40% of the final grade) and a final project (a written report based on a completed data analysis, 60% of the final grade).

Assessment scale

30 - 30L Excellent. The student has acquired all targeted concepts, and is able to confidently make decisions about language data analysis and implement them in practice.

27 - 29 Above average. The student has a very good command of the targeted concepts, with some minor errors or inconsistencies in decisions about data analysis and/or their implementation.

24 - 26 Generally sound. The student has a generally good command of the targeted concepts, but with larger gaps or inconsistencies in decisions about data analysis and/or their implementation.

21 - 23 Adequate. The student has just an adequate command of the targeted concepts and displays significant shortcomings in decisions about data analysis and/or their implementation.

18 - 20 Minimum. The student has only grasped the basic targeted concepts and can only make and implement some straightforward decisions related to language data analysis.

< 18 Fail. The student does not reach a minimum threshold of knowledge and seems unable to make decisions on how to analyse language data and implement the analysis in practice.

Teaching tools

All class materials will be made available on Virtuale. The students will need to use Microsoft Excel (partially replaceable by LibreOffice Calc) and R.

Office hours

See the website of Maja Milicevic Petrovic