Undergraduate Course: Text Technologies for Data Science (INFR11100)
Course Outline
School | School of Informatics |
College | College of Science and Engineering |
Course type | Standard |
Availability | Available to all students |
Credit level (Normal year taken) | SCQF Level 11 (Year 4 Undergraduate) |
Credits | 10 |
Home subject area | Informatics |
Other subject area | None |
Course website |
None |
Taught in Gaelic? | No |
Course description | The course deals with retrieval technologies behind search engines, such as Google. The course will aim to strike a balance between theoretical and system-related aspects of the field. The course will cover:
1. Theoretical aspects, including properties of text, queries, relevance, major retrieval models and evaluation;
2. System-related aspects, including crawlers, text processing, index construction and retrieval algorithms. |
Entry Requirements (not applicable to Visiting Students)
Pre-requisites |
|
Co-requisites | |
Prohibited Combinations | Students MUST NOT also be taking
Text Technologies (Level 11) (INFR11027)
|
Other requirements | This course is open to all Informatics students including those on joint degrees. For external students where this course is not listed in your DPT, please seek special permission from the course organiser.
This course has the following mathematics prerequisites:
1. Probability theory: random variables, expectation, joint and conditional probabilities; discrete and continuous univariate distributions.
2. Algebra: definition of vectors and matrices; vector addition and inner product; matrix multiplication.
3. Calculus: functions of several variables, univariate integrals and derivatives, univariate maxima and minima.
4. Special functions: log, exp. |
Additional Costs | None |
Information for Visiting Students
Pre-requisites | Visiting students are required to have comparable background to that
assumed by the course prerequisites listed in the Degree Regulations &
Programmes of Study. If in doubt, consult the course lecturer. |
Displayed in Visiting Students Prospectus? | Yes |
Course Delivery Information
|
Delivery period: 2014/15 Semester 1, Available to all students (SV1)
|
Learn enabled: No |
Quota: None |
|
Web Timetable |
Web Timetable |
Course Start Date |
15/09/2014 |
Breakdown of Learning and Teaching activities (Further Info) |
Total Hours:
100
(
Lecture Hours 20,
Summative Assessment Hours 2,
Programme Level Learning and Teaching Hours 2,
Directed Learning and Independent Learning Hours
76 )
|
Additional Notes |
|
Breakdown of Assessment Methods (Further Info) |
Written Exam
70 %,
Coursework
30 %,
Practical Exam
0 %
|
Exam Information |
Exam Diet |
Paper Name |
Hours & Minutes |
|
Main Exam Diet S2 (April/May) | | 2:00 | |
|
Delivery period: 2014/15 Semester 1, Part-year visiting students only (VV1)
|
Learn enabled: No |
Quota: None |
|
Web Timetable |
Web Timetable |
Course Start Date |
15/09/2014 |
Breakdown of Learning and Teaching activities (Further Info) |
Total Hours:
100
(
Lecture Hours 20,
Summative Assessment Hours 2,
Programme Level Learning and Teaching Hours 2,
Directed Learning and Independent Learning Hours
76 )
|
Additional Notes |
|
Breakdown of Assessment Methods (Further Info) |
Written Exam
70 %,
Coursework
30 %,
Practical Exam
0 %
|
Exam Information |
Exam Diet |
Paper Name |
Hours & Minutes |
|
Main Exam Diet S1 (December) | | 2:00 | |
Summary of Intended Learning Outcomes
1 - Describe the main algorithms for processing, storing and retrieving text.
2 - Show familiarity with theoretical aspects of IR, including the major retrieval models.
3 - Discuss the range of issues involved in building a real search engine
4 - Evaluate the effectiveness of a retrieval algorithm |
Assessment Information
A combination of problem sets and programming exercises involving application of existing algorithms and evalution techniques.
You should expect to spend approximately 24 hours on the coursework for this course.
If delivered in semester 1, this course will have an option for semester 1 only visiting undergraduate students, providing assessment prior to the end of the calendar year. |
Special Arrangements
None |
Additional Information
Academic description |
Not entered |
Syllabus |
Lectures will cover the following topics, with a typical lecture integrating material from more than one aspect.
1. Theoretical aspects:
* The nature of text, Zipf and Heaps laws, clumping
* Information needs, queries and relevance
* Evaluation of retrieval systems
* Vector-space model and latent semantic indexing
* Probabilistic model and relevance feedback
* Language models or Relevance models
2. Systems aspects:
* Search engine architecture
* Crawling and content extraction
* Text processing and representation
* Indexing methods and compression
* Distributed search and meta-search
* Dealing with vocabulary mismatch
* Duplicate detection |
Transferable skills |
Not entered |
Reading list |
*Search Engines: Information Retrieval in Practice, W.B. Croft, D. Metzler, T. Strohman, Addison Wesley, 2008. Primary text, photocopies will be provided by instructor.
*Introduction to Information Retrieval, C.D. Manning, P. Raghavan and H. Schutze, Cambridge University Press, 2008.
*Managing Gigabytes, I.H. Witten, A. Moffat, T.C. Bell, Morgan Kaufmann, 1999.
*Information Retrieval, C. J. van Rijsbergen, Butterworths, 1979.
*Recommended Reading for IR Research Students, A. Moffat, J. Zobel, D. Hawking. SIGIR Forum, 39(2), 2005. |
Study Abroad |
Not entered |
Study Pattern |
Not entered |
Keywords | Not entered |
Contacts
Course organiser | Dr Victor Lavrenko
Tel: (0131 6)51 5612
Email: vlavrenk@inf.ed.ac.uk |
Course secretary | Miss Claire Edminson
Tel: (0131 6)51 7607
Email: C.Edminson@ed.ac.uk |
|
© Copyright 2014 The University of Edinburgh - 29 August 2014 4:12 am
|