THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2021/2022

Information in the Degree Programme Tables may still be subject to change in response to Covid-19

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Informatics : Informatics

Undergraduate Course: Text Technologies for Data Science (INFR11145)

Course Outline
SchoolSchool of Informatics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 11 (Year 4 Undergraduate) AvailabilityAvailable to all students
SCQF Credits20 ECTS Credits10
SummaryThis course teaches the basic technologies required for text processing, focussing mainly on information retrieval and text classification. It gives a detailed overview of information retrieval and describes how search engines work. It also covers basic knowledge of the main steps for text classification.

This course is a highly practical course, where at least 50% of what is taught in the course will be implemented from scratch in course works and labs, and students are required to complete a final project in small groups. All lectures, labs, and two course works will take place in Semester 1. The final group project will be due early Semester 2 by week 3 or 4.
Course description Syllabus:
* Introduction to IR and text processing, system components
* Zipf, Heaps, and other text laws
* Pre-processing: tokenization, normalisation, stemming, stopping.
* Indexing: inverted index, boolean and proximity search
* Evaluation methods and measures (e.g., precision, recall, MAP, significance testing).
* Query expansion
* IR toolkits and applications
* Ranked retrieval and learning to rank
* Text classification: feature extraction, baselines, evaluation
* Web search
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Students MUST have passed:
Co-requisites
Prohibited Combinations Other requirements Maths requirements:
1. Linear algebra: Strong knowledge of vectors and matrices with all related mathematical operations (addition, multiplication, inverse, projections ... etc).
2. Probability theory: Discrete and continuous univariate random variables. Bayes rule. Expectation, variance. Univariate Gaussian distribution.
3. Calculus: Functions of several variables. Partial differentiation. Multivariate maxima and minima.
4. Special functions: Log, Exp, Ln.

Programming requirements:
1. Python and/or Perl, and good knowledge in regular expressions
2. Shell commands (cat, sort, grep, sed, ...)
3. Additional programming language could be useful for course project.

Team-work requirement:
Final course project would be in groups of 4-6 students. Working in a team for the project is a requirement.
Information for Visiting Students
Pre-requisitesMaths requirements:
1. Linear algebra: Strong knowledge of vectors and matrices with all related mathematical operations (addition, multiplication, inverse, projections ... etc).
2. Probability theory: Discrete and continuous univariate random variables. Bayes rule. Expectation, variance. Univariate Gaussian distribution.
3. Calculus: Functions of several variables. Partial differentiation. Multivariate maxima and minima.
4. Special functions: Log, Exp, Ln.

Programming requirements:
1. Python and/or Perl, and good knowledge in regular expressions
2. Shell commands (cat, sort, grep, sed, ...)
3. Additional programming language could be useful for course project.

Team-work requirement:
Final course project would be in groups of 4-6 students. Working in a team for the project is a requirement.
High Demand Course? Yes
Course Delivery Information
Academic year 2021/22, Available to all students (SV1) Quota:  None
Course Start Full Year
Course Start Date 20/09/2021
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 200 ( Lecture Hours 18, Supervised Practical/Workshop/Studio Hours 12, Summative Assessment Hours 2, Programme Level Learning and Teaching Hours 4, Directed Learning and Independent Learning Hours 164 )
Assessment (Further Info) Written Exam 30 %, Coursework 70 %, Practical Exam 0 %
Additional Information (Assessment) Written Exam 30%
Coursework 70%

Total mark on CW will be 70%, with the following split:

CW1: 10%, individual work covers implementing basic search engine

CW2: 20%, individual work covering IR evaluation and web search

CW3: 40%, is a group project, where each group is 4-6 members.


All of the coursework is heavy on system implementation, and thus being familiar with programming and software engineering is a pre-requisite. Python is required for implementation of CW1 and CW2. For CW3, students are free to use the implementation language they prefer.
Feedback Not entered
Exam Information
Exam Diet Paper Name Hours & Minutes
Main Exam Diet S2 (April/May)2:00
Learning Outcomes
On completion of this course, the student will be able to:
  1. Build basic search engines from scratch, and use IR tools for searching massive collections of text documents
  2. Build feature extraction modules for text classification
  3. Implement evaluation scripts for IR and text classification
  4. Understand how web search engines (such as Google) work
  5. Work effectively in a team to produce working systems
Reading List
"Introduction to Information Retrieval", C.D. Manning, P. Raghavan and H. Schutze

"Search Engines: Information Retrieval in Practice", W. Bruce Croft, Donald Metzler, Trevor Strohman

"Machine Learning in Automated Text Categorization". F Sebastiani "The Zipf Mystery"

Additional research papers and videos to be recommended during lectures
Additional Information
Course URL http://www.inf.ed.ac.uk/teaching/courses/ttds
Graduate Attributes and Skills Not entered
Additional Class Delivery Information 18 lectures
KeywordsNot entered
Contacts
Course organiserDr Walid Magdy
Tel: (0131 6)51 5612
Email: wmagdy@inf.ed.ac.uk
Course secretaryMiss Lori Anderson
Tel: (0131 6)51 4164
Email: lori.anderson@ed.ac.uk
Navigation
Help & Information
Home
Introduction
Glossary
Search DPTs and Courses
Regulations
Regulations
Degree Programmes
Introduction
Browse DPTs
Courses
Introduction
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Prospectuses
Important Information