Undergraduate Course: Foundations of Natural Language Processing (INFR10078)
Course Outline
School | School of Informatics |
College | College of Science and Engineering |
Credit level (Normal year taken) | SCQF Level 10 (Year 3 Undergraduate) |
Availability | Available to all students |
SCQF Credits | 20 |
ECTS Credits | 10 |
Summary | This course covers some of the foundations of natural language processing (NLP) and equips students for more advanced NLP courses in year 4. We focus on what makes automatic processing of language unique and challenging: its statistical properties, complex structure, and pervasive ambiguity. We cover a range of architectures and algorithms for NLP. The course starts with simple models for text classification and generation. We will then discuss neural models to represent the meaning of words and model language, such as Recurrent Neural Networks and Transformers.
Students will gain insight into the technology behind contemporary Large Language Models, including pre-training and supervised fine-tuning techniques. As part of the course, we will also introduce methodological and ethical considerations (e.g., evaluation, data collection, algorithmic bias) that are important for working in the field. |
Course description |
The course will first introduce simple and interpretable models, such as n-gram and bag-of-words models and logistic regression, to illustrate a range of NLP tasks (language modelling, classification, and generation), as well as the basic framework for NLP experiments (training, evaluation, baselines). We will also introduce classic approaches to predicting linguistic representations (HMMs and PCFGs).
Next, we will cover neural architectures for NLP (such as Multi-Layer Perceptrons, Recurrent Neural Networks, and Transformers), which are more opaque and data-hungry, but also achieve better performance on NLP tasks.
Finally, we will discuss the framework of transfer learning, which leverages large amounts of unsupervised data, and the training pipeline for Large Language Models.
Throughout the course, we will introduce concepts and findings from linguistics as a way to understand the challenges of this type of data. We will discuss the strengths and weaknesses of different approaches, including both technical and ethical challenges (such as bias and interpretability). We will illustrate how NLP models can be used for specific applications (e.g., translation and summarisation).
|
Information for Visiting Students
Pre-requisites | Understanding of basic probability; e.g., Bayes Rule
Familiar with basic computational processes: e.g., recursion, dynamic programming
Able to code in Python.
Basic knowledge of linguistic categories: e.g., Noun, Verb.
Familiar with first order logic.
|
High Demand Course? |
Yes |
Course Delivery Information
|
Academic year 2025/26, Available to all students (SV1)
|
Quota: None |
Course Start |
Semester 2 |
Timetable |
Timetable |
Learning and Teaching activities (Further Info) |
Total Hours:
200
(
Lecture Hours 30,
Seminar/Tutorial Hours 5,
Supervised Practical/Workshop/Studio Hours 5,
Programme Level Learning and Teaching Hours 4,
Directed Learning and Independent Learning Hours
156 )
|
Assessment (Further Info) |
Written Exam
75 %,
Coursework
25 %,
Practical Exam
0 %
|
Additional Information (Assessment) |
The coursework will include two practical assignments with written reports, in which parts of an NLP system will be implemented and the results analysed. |
Feedback |
Tutorial exercises will be pen and paper (e.g., using an algorithm to analyse a toy example step by step). Students will prepare answers in advance of the tutorial, and present their analyses and get feedback on it during the tutorial.
Labs will consist of doing a small amount of programming, implementing algorithms taught in the lectures, running it on corpora and evaluating the results, with demonstrators available for guidance.
Coursework will involve more extensive implementation of the algorithms and models taught in lectures. Feedback will be a raw grade plus qualitative feedback. |
Exam Information |
Exam Diet |
Paper Name |
Minutes |
|
Main Exam Diet S2 (April/May) | Foundations of Natural Language Processing (INFR10078) | 120 | |
Learning Outcomes
On completion of this course, the student will be able to:
- explain and provide examples illustrating some of the main challenges facing machine learning approaches to natural language data, including issues arising from properties of language (e.g., long sequence modelling; variation across languages, domains, and genres) and from social and ethical concerns (e.g., algorithmic bias and discrimination, interpretability)
- describe NLP models for classification and generation tasks; the experimental setup for training and testing; and how these models address some of the technical challenges described above
- for a range of NLP applications, outline possible approaches, including standard data sets, models, and evaluation methods. Discuss potential strengths and weaknesses of the suggested approaches (including both technical and ethical issues, where appropriate), and provide examples to illustrate
- implement parts of an NLP system with the help of appropriate support code and/or tools. Evaluate and interpret the results of implemented methods on natural language data sets
|
Reading List
REQUIRED: Dan Jurafsky and James Martin Speech and Language Processing (3rd edition online, and 2009 2nd edition for chapters that aren't yet updated in 3rd edition).
RECOMMENDED: Bird, S., E. Klein and E. Loper, Natural Language Processing with Python, (2009) O'Reilly Media. |
Additional Information
Course URL |
https://opencourse.inf.ed.ac.uk/fnlp |
Graduate Attributes and Skills |
Cognitive skills: critical thinking (via tutorials, labs and assessed work), detecting and handling ambiguity (via the study of linguistic ambiguity in this course).
Responsibility, autonomy and effectiveness: self-awareness and reflection (via acquisition of the skill of perceiving linguistic ambiguity that in normal human language processing people don't perceive), independent learning (via the labs, required reading and preparation for tutorials), exploration and testing of evidence towards (or against) a hypothesis (via the labs and tutorials), time management (via coursework).
Communication: written communication. |
Additional Class Delivery Information |
30 lectures, 5 tutorials and 5 labs. The tutorials and labs will occur in alternate weeks. |
Keywords | natural language,corpus-based methods,machine learning |
Contacts
Course organiser | Dr Ivan Titov
Tel: (0131 6)51 3092
Email: ititov@exseed.ed.ac.uk |
Course secretary | Miss Rose Hynd
Tel: (0131 6)50 5194
Email: rhynd@ed.ac.uk |
|
|