92586 - Computational Linguistics

Course Unit Page

Academic Year 2021/2022

Learning outcomes

The student will learn the basic theoretical aspects of computational linguistics/natural language processing and will acquire practical skills to perform from tokenization and vectorization to the computation of similarities and supervised models (e.g., for topic identification, sentiment analysis).


Course contents

Whereas the contents could be (slightly) adapted according to the students skills and interests, the general structure of the course will be as follows.

 

  1. Introduction to Computational Linguistics
  2. Words and vector space model
  3. Naive Bayes
  4. Word vectors
  5. From counts to meaning
  6. Training and evaluation in machine learning
  7. Corpora
  8. Introduction to LSA
  9. Introduction to neural networks
  10. Word embedings
  11. From document representations towards sequences
  12. Convolutional NNs for text
  13. Recurrent NNs for text
  14. Beyond

Readings/Bibliography

  1. Hobson Lane, Cole Howard, Hannes Hapke (2019). Natural Language Processing in Action Understanding, analyzing, and generating text with Python. Manning Publications.

    Optional
  2. Dirk Hovy (2020).Text Analysis in Python for Social Scientists. Cambridge University Press.
  3. Kenneth Ward Church. Unix for poets.
  4. Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft) Draft chapters in progress, October 16, 2019
  5. Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python. http://www.nltk.org/book/
  6. Yoav Goldberg. (2017). Neural Network Methods for Natural Language Processing (G. Hirst, ed.). Morgan & Claypool Publishers.
  7. Emily M. Bender (2013). Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.

Teaching methods

The course will be a combination of seminar and practical sessions. In either case, active participation of the students will be expected. A crash course on python programming will be held in the semester before (date/place tbd).

As concerns the teaching methods of this course unit, all students must attend the online Modules 1, 2 on Health and Safety.

Assessment methods

The student will work on addressing a problem within his/her own research interests with the knowledge acquired during the course. Upon agreement of the topic with the instructor, the student will work on solving the problem and will write a written report.

The final evaluation will be computed as a combination of both report and oral exam around it. The final report must be submitted one week before the exam, at the latest.

Grading scale

  • 30-30L: The student possesses an in-depth knowledge of the topic, an outstanding ability to apply the concepts. The student carries out rigorous formal experiments and produces an outstanding report, enough to be considered for submission to a national conference in the field.
  • 27–29: The student possesses an in-depth knowledge of the topic, a sound ability to apply concepts, and good analytical skills. The student carries out good formal experiments and produces a high-quality report.
  • 24-26: The candidate possesses a fair knowledge of the topic and a reasonable ability to apply concepts correctly. The student carries out some reasonable experiments and produces a good report.
  • 21-23: The candidate possesses an adequate, but not in-depth, knowledge of the topic and a partial ability to apply concepts. The student carries out faulty experiments and produces a reasonable report.
  • 18-20: The candidate possesses a barely adequate and only superficial knowledge of topic and only an inconsistent ability to apply concepts. The student carries out wrong experiments and produces a defficient report.
  • < 18 Fail: The candidate possesses an inadequate knowledge of the topic, makes significant errors in applying concepts. Both experiments and report are poor.

Teaching tools

Seminars will be carried out with slides and coding will be carried out with jupyter notebooks. Continuous exercises will be carried out.

Office hours

See the website of Luis Alberto Barron Cedeno