91258 - NATURAL LANGUAGE PROCESSING

Course Unit Page

Academic Year 2022/2023

Learning outcomes

The student knows the basic theoretical aspects of Natural Language Processing (NLP); s/he is able to perform core NLP tasks, such as tokenisation, vectorisation and similarity computations; s/he is able to build and analyse corpora for the construction of prediction models; s/he is able to apply supervised models for typical NLP problems such as topic identification or sentiment analysis.

Course contents

Whereas the contents could be (slightly) adapted according to the students skills and interests, the general structure of the course will be as follows.

  1. Introduction to Computational Linguistics
  2. Words and vector space model
  3. Naive Bayes
  4. Word vectors
  5. From counts to meaning
  6. Training and evaluation in machine learning
  7. Introduction to LSA
  8. Introduction to neural networks
  9. Word embedings
  10. From document representations towards sequences
  11. Convolutional NNs for text
  12. Recurrent NNs for text
  13. Beyond


Readings/Bibliography

  1. Hobson Lane, Cole Howard, Hannes Hapke (2019). Natural Language Processing in Action Understanding, analyzing, and generating text with Python [https://www.manning.com/books/natural-language-processing-in-action]. Manning Publications.

    Optional
  2. Dirk Hovy (2020). Text analysis in Python for social scientists. Cambridge University Press.
  3. Kenneth Ward Church. Unix for poets.
  4. Dan Jurafsky and James H. Martin. Speech and Language Processing(3rd ed. draft) Draft chapters in progress, October 16, 2019
  5. Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python.
  6. Yoav Goldberg. (2017). Neural Network Methods for Natural Language Processing (G. Hirst, ed.). Morgan & Claypool Publishers.
  7. Emily M. Bender (2013). Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.

Teaching methods

The course will be a combination of seminar and practical sessions. In either case, active participation of the students will be expected. A crash course on python programming will be held in the semester before (date/place tbd).

As concerns the teaching methods of this course unit, all students must attend the online Modules 1, 2.

Assessment methods

The student will work on addressing a problem within his/her own research interests with the knowledge acquired during the course. Upon agreement of the topic with the instructor, the student will work on solving the problem and will write a written report.

The final evaluation will be computed as a combination of both report and oral exam around it. The final report must be submitted one week before the exam, at the latest.

Grading scale

  • 30-30L: The student possesses an in-depth knowledge of the topic, an outstanding ability to apply the concepts. The student carries out rigorous formal experiments and produces an outstanding report, enough to be considered for submission to a national conference in the field.
  • 27–29: The student possesses an in-depth knowledge of the topic, a sound ability to apply concepts, and good analytical skills. The student carries out good formal experiments and produces a high-quality report.
  • 24-26: The candidate possesses a fair knowledge of the topic and a reasonable ability to apply concepts correctly. The student carries out some reasonable experiments and produces a good report.
  • 21-23: The candidate possesses an adequate, but not in-depth, knowledge of the topic and a partial ability to apply concepts. The student carries out faulty experiments and produces a reasonable report.
  • 18-20: The candidate possesses a barely adequate and only superficial knowledge of topic and only an inconsistent ability to apply concepts. The student carries out wrong experiments and produces a defficient report.
  • < 18 Fail: The candidate possesses an inadequate knowledge of the topic, makes significant errors in applying concepts. Both experiments and report are poor.


Teaching tools

Seminars will be lectured with slides and coding will be produced on jupyter notebooks [https://jupyter.org/] . Continuous exercises will be carried out.

Links to further information

https://albarron.github.io/teaching/natural-language-processing/

Office hours

See the website of Luis Alberto Barron Cedeno