- Docente: Fabio Tamburini
- Crediti formativi: 6
- SSD: L-LIN/01
- Lingua di insegnamento: Inglese
- Modalità didattica: Convenzionale - Lezioni in presenza
- Campus: Bologna
- Corso: Laurea Magistrale in Digital humanities and digital knowledge (cod. 9224)
Conoscenze e abilità da conseguire
The course aims at initiating to techniques for texts manipulation. At the end of the course the student knows how to process texts using computational tools, how to retrieve and extract information from large text corpora, how to annotate texts with linguistic information, how to classify texts and perform topic modelling and how to manage social media texts for mining information, opinions and sentiments.
Contenuti
Techniques for corpus creation and managment
- Corpus linguistics: representativeness, annotations and querying. The Zipf's law. Web as a corpus.
- Tokenisation and sentence splitting.
- Methods for Text Retrieval.
- Regular expressions.
- Multimodal annotations: annotation graph.
- XML corpora.
- Corpus querying packages.
- Case studies:
- Written ans spoken corpora (Italian/English): a review.
- Corpora@FICLIT: CORIS/CODIS, BoLC and DiaCORIS.
- On the importance of quantitative analysis for linguistics.
- Fundamentals of R statistical package.
- Descriptive statistics.
- Analytical/Inferential statistics.
Testi/Bibliografia
Some sections extracted from:
- McEnery T., Wilson A. (2001). Corpus Linguistics, Edinburgh University Press.
- D. Jurafsky and J.H. Martin (2008). Speech and Language Processing, Prentice Hall.
- Gries, S. (2009). Statistics for Linguistics with R. De Gruyter.
Slides, handouts and papers downloadable from the course web site.
Metodi didattici
Face-to-face classes and labs for 30 hours.
Modalità di verifica e valutazione dell'apprendimento
An oral colloquium consisting of at least three questions on the course contents.
Reaching a clear view of all the course topics as well as using a correct language terminology will be valued with maximum rankings. Mnemonic knowledge of the course topics or not completely appropriate terminology will be valued with intermediate rankings. Unknown topics or inappropriate terminology use will be valued, depending on the seriousness of the omissions, with minimal or insufficient rankings.
Strumenti a supporto della didattica
The course web site is the central point for any kind of information about the course. It contains the handouts and the readings discussed during the lessons as well as a rich software repository useful for laboratory practice.
Link ad altre eventuali informazioni
http://corpora.ficlit.unibo.it/TRAM/
Orario di ricevimento
Consulta il sito web di Fabio Tamburini
SDGs
L'insegnamento contribuisce al perseguimento degli Obiettivi di Sviluppo Sostenibile dell'Agenda 2030 dell'ONU.