THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2017/2018

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Informatics : Informatics

Undergraduate Course: Text Technologies for Data Science (INFR11145)

Course Outline
SchoolSchool of Informatics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 11 (Year 4 Undergraduate) AvailabilityAvailable to all students
SCQF Credits20 ECTS Credits10
SummaryThe course deals with retrieval technologies behind search engines, such as Google. The course will aim to strike a balance between theoretical and system-related aspects of the field. The course will cover:

1. Theoretical aspects, including properties of text, queries, relevance, major retrieval models and evaluation;
2. System-related aspects, including crawlers, text processing, index construction and retrieval algorithms.
Course description Syllabus
1. Introduction: search applications, search tasks, users information need
2. Definitions: documents, queries, bag-of-words trick
3. Laws of text: Zipf, Heaps, clumpting, index size.
4. Vector space: term weighting, similarity functions.
5. Vocabulary mismatch: tokenization, stemming, synonyms.
6. Indexing: inverted lists, compression, query execution.
7. Web crawling: XML feeds, crawling, expected age.
8. Content Extraction: XML tags, DOM, Finn's method.
9. Locality Sensitive Hashing: duplicates, Simhash.
10. Evalaution: recall, precision, F1, MAP, nDCG, query logs.
11. Web search: PageRank, hubs and authorities, link spam.
12. Probabilistic model: probability ranking principle, BM25.
13. Relevance models: exchangeability, cross-language search.
14. Language models for IR: query likelihood, smoothing.
15. Machine learning in IR: PA, SVM, SMO algorithms, LeToR.
16. Social media search, nature, challenges, tasks
17. Information filtering, topic drift
18. Text classification
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Students MUST have passed:
Co-requisites
Prohibited Combinations Other requirements Maths requirements:
1. Linear algebra: Strong knowledge of vectors and matrices with all related mathematical operations (addition, multiplication, inverse, projections ... etc).
2. Probability theory: Discrete and continuous univariate random variables. Bayes rule. Expectation, variance. Univariate Gaussian distribution.
3. Calculus: Functions of several variables. Partial differentiation. Multivariate maxima and minima.
4. Special functions: Log, Exp, Ln.

Programming requirements:
1. Pyhton and/or Perl, and good knowledge in regular expressions
2. Shell commands (cat, sort, grep, sed, ...)
3. Additional programming language could be useful for course project.

Team-work requirement:
Final course project would be in groups of 4-6 students. Working in a team for the project is a requirement.
Information for Visiting Students
Pre-requisitesMaths requirements:
1. Linear algebra: Strong knowledge of vectors and matrices with all related mathematical operations (addition, multiplication, inverse, projections ... etc).
2. Probability theory: Discrete and continuous univariate random variables. Bayes rule. Expectation, variance. Univariate Gaussian distribution.
3. Calculus: Functions of several variables. Partial differentiation. Multivariate maxima and minima.
4. Special functions: Log, Exp, Ln.

Programming requirements:
1. Pyhton and/or Perl, and good knowledge in regular expressions
2. Shell commands (cat, sort, grep, sed, ...)
3. Additional programming language could be useful for course project.

Team-work requirement:
Final course project would be in groups of 4-6 students. Working in a team for the project is a requirement.
High Demand Course? Yes
Course Delivery Information
Academic year 2017/18, Available to all students (SV1) Quota:  None
Course Start Semester 1
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 200 ( Lecture Hours 18, Supervised Practical/Workshop/Studio Hours 12, Summative Assessment Hours 2, Programme Level Learning and Teaching Hours 4, Directed Learning and Independent Learning Hours 164 )
Assessment (Further Info) Written Exam 60 %, Coursework 40 %, Practical Exam 0 %
Additional Information (Assessment) Written examination will evaluate students' understanding to fundamentals of text technologies and IR = worth 60% of total course mark.

In additon, coursework will include three practical assignments to show the depth of understanding to the basics when applied to real-life problems. (Worth 40% of total course mark).

Assignments will be designed as follows:
1) Two assignments for student to work individually (10% each).
2) One course project assignment for group of students, 2-4 students per group (20%).
Feedback Not entered
Exam Information
Exam Diet Paper Name Hours & Minutes
Main Exam Diet S2 (April/May)2:00
Academic year 2017/18, Part-year visiting students only (VV1) Quota:  None
Course Start Semester 1
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 200 ( Lecture Hours 18, Supervised Practical/Workshop/Studio Hours 12, Summative Assessment Hours 2, Programme Level Learning and Teaching Hours 4, Directed Learning and Independent Learning Hours 164 )
Assessment (Further Info) Written Exam 60 %, Coursework 40 %, Practical Exam 0 %
Additional Information (Assessment) Written examination will evaluate students' understanding to fundamentals of text technologies and IR = worth 60% of total course mark.

In additon, coursework will include three practical assignments to show the depth of understanding to the basics when applied to real-life problems. (Worth 40% of total course mark).

Assignments will be designed as follows:
1) Two assignments for student to work individually (10% each).
2) One course project assignment for group of students, 2-4 students per group (20%).
Feedback Not entered
Exam Information
Exam Diet Paper Name Hours & Minutes
Main Exam Diet S1 (December)2:00
Learning Outcomes
On completion of this course, the student will be able to:
  1. Describe the main algorithms for processing, storing and retrieving text.
  2. Show familiarity with theoretical aspects of IR, including the major retrieval models.
  3. Discuss the range of issues involved in building a real search engine
  4. Evaluate the effectiveness of a retrieval algorithm
  5. Build social media applications using text processing techniques
Reading List
Text books:
"Introduction to Information Retrieval", C.D. Manning, P. Raghavan and H. Schutze
"Search Engines: Information Retrieval in Practice", W. Bruce Croft, Donald Metzler, Trevor Strohman
Readings:
"Machine Learning in Automated Text Categorization". F Sebastiani "The Zipf Mystery",
Youtube video:
https://www.youtube.com/watch?v=fCn8zs912OE
"Information Retrieval", C.J. van Rijsbergen
"Recommended Reading for IR Research Students", A. Moffat, J. Zobel, D. Hawking
Additional Information
Course URL http://www.inf.ed.ac.uk/teaching/courses/ ttds
Graduate Attributes and Skills Not entered
Additional Class Delivery Information 18 lectures
KeywordsNot entered
Contacts
Course organiserDr Walid Magdy
Tel: (0131 6)51 5612
Email: wmagdy@inf.ed.ac.uk
Course secretaryMr Gregor Hall
Tel: (0131 6)50 5194
Email: gregor.hall@ed.ac.uk
Navigation
Help & Information
Home
Introduction
Glossary
Search DPTs and Courses
Regulations
Regulations
Degree Programmes
Introduction
Browse DPTs
Courses
Introduction
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Prospectuses
Important Information