# DEGREE REGULATIONS & PROGRAMMES OF STUDY 2018/2019

 University Homepage DRPS Homepage DRPS Search DRPS Contact
DRPS : Course Catalogue : School of Informatics : Informatics

# Undergraduate Course: Text Technologies for Data Science (INFR11145)

 School School of Informatics College College of Science and Engineering Credit level (Normal year taken) SCQF Level 11 (Year 4 Undergraduate) Availability Available to all students SCQF Credits 20 ECTS Credits 10 Summary This course teaches the basic technologies required for text processing, focussing mainly on information retrieval and text classification. It gives a detailed overview of information retrieval and describes how search engines work. It also covers basic knowledge of the main steps for text classification. This course is a highly practical course, where at least 50% of what is taught in the course will be implemented from scratch in course works and labs, and students are required to complete a final project in small groups. All lectures, labs, and two course works will take place in Semester 1. The final group project will be due early Semester 2 by week 3 or 4. Course description Syllabus: * Introduction to IR and text processing, system components * Zipf, Heaps, and other text laws * Pre-processing: tokenization, normalisation, stemming, stopping. * Indexing: inverted index, boolean and proximity search * Evaluation methods and measures (e.g., precision, recall, MAP, significance testing). * Query expansion * IR toolkits and applications * Ranked retrieval and learning to rank * Text classification: feature extraction, baselines, evaluation * Web search
 Pre-requisites Students MUST have passed: Co-requisites Prohibited Combinations Other requirements Maths requirements: 1. Linear algebra: Strong knowledge of vectors and matrices with all related mathematical operations (addition, multiplication, inverse, projections ... etc). 2. Probability theory: Discrete and continuous univariate random variables. Bayes rule. Expectation, variance. Univariate Gaussian distribution. 3. Calculus: Functions of several variables. Partial differentiation. Multivariate maxima and minima. 4. Special functions: Log, Exp, Ln. Programming requirements: 1. Pyhton and/or Perl, and good knowledge in regular expressions 2. Shell commands (cat, sort, grep, sed, ...) 3. Additional programming language could be useful for course project. Team-work requirement: Final course project would be in groups of 4-6 students. Working in a team for the project is a requirement.
 Pre-requisites Maths requirements: 1. Linear algebra: Strong knowledge of vectors and matrices with all related mathematical operations (addition, multiplication, inverse, projections ... etc). 2. Probability theory: Discrete and continuous univariate random variables. Bayes rule. Expectation, variance. Univariate Gaussian distribution. 3. Calculus: Functions of several variables. Partial differentiation. Multivariate maxima and minima. 4. Special functions: Log, Exp, Ln. Programming requirements: 1. Pyhton and/or Perl, and good knowledge in regular expressions 2. Shell commands (cat, sort, grep, sed, ...) 3. Additional programming language could be useful for course project. Team-work requirement: Final course project would be in groups of 4-6 students. Working in a team for the project is a requirement. High Demand Course? Yes
 Academic year 2018/19, Available to all students (SV1) Quota:  None Course Start Full Year Timetable Timetable Learning and Teaching activities (Further Info) Total Hours: 200 ( Lecture Hours 18, Supervised Practical/Workshop/Studio Hours 12, Summative Assessment Hours 2, Programme Level Learning and Teaching Hours 4, Directed Learning and Independent Learning Hours 164 ) Assessment (Further Info) Written Exam 50 %, Coursework 50 %, Practical Exam 0 % Additional Information (Assessment) Written examination will evaluate students' understanding of the fundamentals of text technologies and IR. Coursework will include two practical assignments to show the depth of understanding of the basics of IR and text classification; and a group project that would require applying some of the knowledge gained during course to implement a running application by a team of students. Coursework will be designed as follows: 1) Two assignments for student to work individually (worth 20% in total). 2) One course final project assignment, to be completed in small groups (worth 30%). This project is required to be submitted near the beginning of the second semester. Feedback Not entered Exam Information Exam Diet Paper Name Hours & Minutes Main Exam Diet S2 (April/May) Text Technologies for Data Science 2:00
 On completion of this course, the student will be able to: Build basic search engines from scratch, and use IR tools for searching massive collections of text documentsBuild feature extraction modules for text classificationImplement evaluation scripts for IR and text classificationUnderstand how web search engines (such as Google) workWork effectively in a team to produce working systems
 "Introduction to Information Retrieval", C.D. Manning, P. Raghavan and H. Schutze "Search Engines: Information Retrieval in Practice", W. Bruce Croft, Donald Metzler, Trevor Strohman "Machine Learning in Automated Text Categorization". F Sebastiani "The Zipf Mystery" Additional research papers and videos to be recommended during lectures
 Course URL http://www.inf.ed.ac.uk/teaching/courses/ ttds Graduate Attributes and Skills Not entered Additional Class Delivery Information 18 lectures Keywords Not entered
 Course organiser Dr Walid Magdy Tel: (0131 6)51 5612 Email: wmagdy@inf.ed.ac.uk Course secretary Mr Gregor Hall Tel: (0131 6)50 5194 Email: gregor.hall@ed.ac.uk
 Navigation Help & Information Home Introduction Glossary Search DPTs and Courses Regulations Regulations Degree Programmes Introduction Browse DPTs Courses Introduction Humanities and Social Science Science and Engineering Medicine and Veterinary Medicine Other Information Combined Course Timetable Prospectuses Important Information