Undergraduate Course: Text Technologies for Data Science (UG) (INFR11229)
Course Outline
School  School of Informatics 
College  College of Science and Engineering 
Credit level (Normal year taken)  SCQF Level 11 (Year 4 Undergraduate) 
Availability  Available to all students 
SCQF Credits  20 
ECTS Credits  10 
Summary  This course follows the delivery and assessment of Text Technologies for Data Science (INFR11145) exactly. Undergraduate students must register for this course, while MSc students must register for INFR11145 instead. 
Course description 
This course follows the delivery and assessment of Text Technologies for Data Science (INFR11145) exactly. Undergraduate students must register for this course, while MSc students must register for INFR11145 instead.

Entry Requirements (not applicable to Visiting Students)
Prerequisites 

Corequisites  
Prohibited Combinations  
Other requirements  This course follows the delivery and assessment of Text Technologies for Data Science (INFR11145) exactly. Undergraduate students must register for this course, while MSc students must register for INFR11145 instead.
Maths requirements:
1. Linear algebra: Strong knowledge of vectors and matrices with all related mathematical operations (addition, multiplication, inverse, projections ... etc).
2. Probability theory: Discrete and continuous univariate random variables. Bayes rule. Expectation, variance. Univariate Gaussian distribution.
3. Calculus: Functions of several variables. Partial differentiation. Multivariate maxima and minima.
4. Special functions: Log, Exp, Ln.
Programming requirements:
1. Python and/or Perl, and good knowledge in regular expressions
2. Shell commands (cat, sort, grep, sed, ...)
3. Additional programming language could be useful for course project.
Teamwork requirement:
Final course project would be in groups of 46 students. Working in a team for the project is a requirement. 
Information for Visiting Students
Prerequisites  As above. No part time visiting students permitted. 
High Demand Course? 
Yes 
Course Delivery Information

Academic year 2023/24, Available to all students (SV1)

Quota: None 
Course Start 
Full Year 
Course Start Date 
18/09/2023 
Timetable 
Timetable 
Learning and Teaching activities (Further Info) 
Total Hours:
200
(
Lecture Hours 18,
Supervised Practical/Workshop/Studio Hours 12,
Summative Assessment Hours 2,
Programme Level Learning and Teaching Hours 4,
Directed Learning and Independent Learning Hours
164 )

Assessment (Further Info) 
Written Exam
30 %,
Coursework
70 %,
Practical Exam
0 %

Additional Information (Assessment) 
Exam 30%
Coursework 70%
Course Work 1 10%, individual work covers implementing basic search engine
Course Work 2 20%, individual work covering IR evaluation and web search
Course Work 3 40%, is a group project, where each group is 46 members
All of the coursework is heavy on system implementation, and thus being familiar with programming and software engineering is a prerequisite. Python is required for implementation of Course Work 1 and Course Work 2. For Course Work 3, students are free to use the implementation language they prefer. 
Feedback 
Not entered 
Exam Information 
Exam Diet 
Paper Name 
Hours & Minutes 

Main Exam Diet S2 (April/May)  Text Technologies for Data Science (UG) (INFR11229)  2:00  
Learning Outcomes
On completion of this course, the student will be able to:
 duild basic search engines from scratch, and use IR tools for searching massive collections of text documents
 duild feature extraction modules for text classification
 implement evaluation scripts for IR and text classification
 understand how web search engines (such as Google) work
 work effectively in a team to produce working systems

Reading List
"Introduction to Information Retrieval", C.D. Manning, P. Raghavan and H. Schutze
"Search Engines: Information Retrieval in Practice", W. Bruce Croft, Donald Metzler, Trevor Strohman
"Machine Learning in Automated Text Categorization". F Sebastiani "The Zipf Mystery"
Additional research papers and videos to be recommended during lectures 
Additional Information
Graduate Attributes and Skills 
Not entered 
Keywords  text processing,information retrieval,text classification 
Contacts
Course organiser  Dr Bjorn Ross
Tel: (0131 6)50 3128
Email: b.ross@ed.ac.uk 
Course secretary  Miss Yesica Marco Azorin
Tel: (0131 6)505113
Email: ymarcoa@ed.ac.uk 

