92586 - Computational Linguistics

Academic Year 2020/2021

  • Teaching Mode: Traditional lectures
  • Campus: Forli
  • Corso: Second cycle degree programme (LM) in Specialized translation (cod. 9174)

Learning outcomes

The student will learn the basic theoretical aspects of computational linguistics/natural language processing and will acquire practical skills to perform from tokenization and vectorization to the computation of similarities and supervised models (e.g., for topic identification, structural analysis, meaning analysis).


Course contents

Whereas the contents could be (slightly) adapted according to the students skills and interests, the general structure of the course will be as follows.

 

  1. Introduction to Computational Linguistics
  2. Introduction to Python scripting
  3. Words and vector space model
  4. Naive Bayes
  5. Word vectors
  6. From counts to meaning
  7. Training and evaluation in machine learning
  8. Corpora
  9. Introduction to LSA
  10. Introduction to neural networks
  11. Word embedings
  12. From document representations towards sequences
  13. Convolutions for text
  14. Text is sequential
  15. Beyond

Readings/Bibliography

  1. Hobson Lane, Cole Howard, Hannes Hapke (2019). Natural Language Processing in Action Understanding, analyzing, and generating text with Python. Manning Publications.
  2. Kenneth Ward Church. Unix for poets [https://www.cs.upc.edu/~padro/Unixforpoets.pdf] .

    Optional
  3. Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft) Draft chapters in progress, October 16, 2019
  4. Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python. http://www.nltk.org/book/
  5. Yoav Goldberg. (2017). Neural Network Methods for Natural Language Processing (G. Hirst, ed.). Morgan & Claypool Publishers.
  6. Emily M. Bender (2013). Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.

Teaching methods

The course will be a combination of seminar and practical sessions. In either case, active participation of the students will be expected. We will start with an introduction to the Python programming language and follow with a (practical) description of diverse models and tasks.

Attendance to a minimum of 70% of the lessons is a must.

Assessment methods

The student will work on addressing a problem within his/her own research interests with the knowledge acquired during the course. Upon agreement of the topic, the student will work on solving the problem and will write a written report. A poster session will be organized at the end of the course in which the students will present their research work.

The final evaluation will be computed as a combination of both report and poster presentation.

Grading scale

  • 30-30L: The student possesses an in-depth knowledge of the topic, an outstanding ability to apply the concepts. The student carries out rigorous formal experiments and produces an outstanding report, enough to be considered for submission to a national conference in the field.
  • 27–29: The student possesses an in-depth knowledge of the topic, a sound ability to apply concepts, and good analytical skills. The student carries out good formal experiments and produces a high-quality report.
  • 24-26: The candidate possesses a fair knowledge of the topic and a reasonable ability to apply concepts correctly. The student carries out some reasonable experiments and produces a good report.
  • 21-23: The candidate possesses an adequate, but not in-depth, knowledge of the topic and a partial ability to apply concepts. The student carries out faulty experiments and produces a reasonable report.
  • 18-20: The candidate possesses a barely adequate and only superficial knowledge of topic and only an inconsistent ability to apply concepts. The student carries out wrong experiments and produces a defficient report.
  • < 18 Fail: The candidate possesses an inadequate knowledge of the topic, makes significant errors in applying concepts. Both experiments and report are poor.

Teaching tools

Seminars will be carried out with slides and coding will be carried out with jupyter notebooks. Continuous exercises will be carried out.

Office hours

See the website of Luis Alberto Barron Cedeno