THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2022/2023

Timetable information in the Course Catalogue may be subject to change.

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Informatics : Informatics

Undergraduate Course: Foundations of Natural Language Processing (INFR10078)

Course Outline
SchoolSchool of Informatics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 10 (Year 3 Undergraduate) AvailabilityAvailable to all students
SCQF Credits20 ECTS Credits10
Summary***This course replaces Foundations of Natural Language Processing (INFR09028).***

This course covers some of the linguistic and algorithmic foundations of natural language processing (NLP). It builds on algorithmic and data science concepts developed in second year courses, applying these to NLP problems. It also equips students for more advanced NLP courses in year 4. The course is strongly empirical, using corpus data to illustrate both core linguistic concepts and algorithms, including language modeling, part of speech tagging, syntactic processing, the syntax-semantics interface, and aspects of semantic and pragmatic processing. The theoretical study of linguistic concepts and the application of algorithms to corpora in the empirical analysis of those concepts will be interleaved throughout the course.
Course description An indicative list of topics to be covered include:

1. Lexicon and lexical processing:
* morphology
* language modeling
* hidden Markov Models and associated algorithms
* part of speech tagging (e.g., for a language other than English) to illustrate HMMs
* smoothing
* text classification

2. Syntax and syntactic processing:
* the Chomsky hierarchy
* syntactic concepts: constituency (and tests for it), subcategorization, bounded and unbounded dependencies, feature representations
* context-free grammars
* lexicalized grammar formalisms (e.g., dependency grammar)
* chart parsing and dependency parsing (eg, shift-reduce parsing)
* treebanks: lexicalized grammars and corpus annotation
* statistical parsing

3. Semantics and semantic processing:
* word senses: regular polysemy and the structured lexicon; distributional models; word embeddings (including biases found)
* compositionality, constructing a formal semantic representation from a (disambiguated) sentential syntactic analysis.
* predicate argument structure
* word sense disambiguation
* semantic role labelling
* pragmatic phenomena in discourse and dialogue, including anaphora, presuppositions, implicatures and coherence relations.
* labelled corpora addressing word senses (e.g., Brown), semantic roles (e.g., Propbank, SemCor), discourse information (e.g., PDTB, STAC, RST Treebank).

4. Data and evaluation (interspersed throughout other topics):
* cross-linguistic similarities and differences
* commonly used datasets
* annotation methods and issues (e.g., crowdsourcing, inter-annotator agreement)
* evaluation methods and issues (e.g., standard metrics, baselines)
* effects of biases in data

Entry Requirements (not applicable to Visiting Students)
Pre-requisites Students MUST have passed: Informatics 2A - Processing Formal and Natural Languages (INFR08008) OR Informatics 2 - Introduction to Algorithms and Data Structures (INFR08026) OR Informatics Research Review (INFR11136)
Co-requisites
Prohibited Combinations Students MUST NOT also be taking Accelerated Natural Language Processing (INFR11125) OR Foundations of Natural Language Processing (INFR09028)
Other requirements Open to MSc students, so long as they have not taken ANLP, and they have the following expertise:

Understanding of basic probability; e.g., Bayes Rule
Familiar with basic computational processes: e.g., recursion, dynamic programming
Able to code in Python.
Basic knowledge of linguistic categories: e.g., Noun, Verb.
Familiar with first order logic.
Information for Visiting Students
Pre-requisitesUnderstanding of basic probability; e.g., Bayes Rule
Familiar with basic computational processes: e.g., recursion, dynamic programming
Able to code in Python.
Basic knowledge of linguistic categories: e.g., Noun, Verb.
Familiar with first order logic.
High Demand Course? Yes
Course Delivery Information
Academic year 2022/23, Available to all students (SV1) Quota:  None
Course Start Semester 2
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 200 ( Lecture Hours 30, Seminar/Tutorial Hours 5, Supervised Practical/Workshop/Studio Hours 5, Programme Level Learning and Teaching Hours 4, Directed Learning and Independent Learning Hours 156 )
Assessment (Further Info) Written Exam 75 %, Coursework 25 %, Practical Exam 0 %
Additional Information (Assessment) Tutorials and labs will both consist of exercises, from which the students will receive formative feedback from the tutors and demonstrators.
Feedback Tutorial exercises will be pen and paper (e.g., using an algorithm to analyse a toy example step by step). Students will prepare answers in advance of the tutorial, and present their analyses and get feedback on it during the tutorial.

Labs will consist of doing a small amount of programming, implementing algorithms taught in the lectures, running it on corpora and evaluating the results, with demonstrators available for guidance.

Coursework will involve more extensive implementation of the algorithms and models taught in lectures. Feedback will be a raw grade plus qualitative feedback.
Exam Information
Exam Diet Paper Name Hours & Minutes
Main Exam Diet S2 (April/May)Foundations of Natural Language Processing (INFR10078)2:00
Learning Outcomes
On completion of this course, the student will be able to:
  1. Identify and analyze examples of ambiguity in natural language---ambiguity in part-of-speech, word sense, syntax, semantics and pragmatics. Explain how ambiguity presents a problem for computational analysis and NLP applications and some of the ways it can be addressed (see (2) to (5)).
  2. Describe and apply standard sequence models (e.g., HMMs), classification models (e.g., Naïve Bayes, MaxEnt); parsing algorithms (e.g., statistical chart parsing and dependency parsing) for processing language at different levels (e.g. morphology, syntax and semantics), and simulate each algorithm on `toy linguistic examples step-by-step with pen and paper.
  3. Explain and provide examples of how sparse data can be a problem for machine learning in NLP; describe and apply methods for addressing the sparse data problem.
  4. Given an appropriate NLP problem, students should also be able to identify suitable evaluation measures for testing solutions to the problem, explain the role of annotated corpora in developing those solutions, and assess and justify which sequence of algorithms are most appropriate for solving the problem, based on an understanding of the algorithms in (2) and (3).
  5. Implement parts of the NLP pipeline with the help of appropriate support code and/or tools. Evaluate and interpret the results of implemented methods on natural language data sets.
Reading List
REQUIRED: Dan Jurafsky and James Martin Speech and Language Processing (3rd edition online, and 2009 2nd edition for chapters that aren't yet updated in 3rd edition).

RECOMMENDED: Bird, S., E. Klein and E. Loper, Natural Language Processing with Python, (2009) O'Reilly Media.
Additional Information
Graduate Attributes and Skills Cognitive skills: critical thinking (via tutorials, labs and assessed work), detecting and handling ambiguity (via the study of linguistic ambiguity in this course).

Responsibility, autonomy and effectiveness: self-awareness and reflection (via acquisition of the skill of perceiving linguistic ambiguity that in normal human language processing people don't perceive), independent learning (via the labs, required reading and preparation for tutorials), exploration and testing of evidence towards (or against) a hypothesis (via the labs and tutorials), time management (via coursework).

Communication: written communication.
Special Arrangements None
Additional Class Delivery Information 30 lectures, 5 tutorials and 5 labs. The tutorials and labs will occur in alternate weeks.
Keywordsnatural language,corpus-based methods,machine learning
Contacts
Course organiserProf Henry Thompson
Tel: (0131 6)50 4440
Email: ht@inf.ed.ac.uk
Course secretaryMrs Michelle Bain
Tel: (0131 6)51 7607
Email: michelle.bain@ed.ac.uk
Navigation
Help & Information
Home
Introduction
Glossary
Search DPTs and Courses
Regulations
Regulations
Degree Programmes
Introduction
Browse DPTs
Courses
Introduction
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Prospectuses
Important Information