Undergraduate Course: Text Technologies for Data Science (UG) (INFR11229)
|School||School of Informatics
||College||College of Science and Engineering
|Credit level (Normal year taken)||SCQF Level 11 (Year 4 Undergraduate)
||Availability||Available to all students
|Summary||This course follows the delivery and assessment of Text Technologies for Data Science (INFR11145) exactly. Undergraduate students must register for this course, while MSc students must register for INFR11145 instead.
This course follows the delivery and assessment of Text Technologies for Data Science (INFR11145) exactly. Undergraduate students must register for this course, while MSc students must register for INFR11145 instead.
Entry Requirements (not applicable to Visiting Students)
||Other requirements|| This course follows the delivery and assessment of Text Technologies for Data Science (INFR11145) exactly. Undergraduate students must register for this course, while MSc students must register for INFR11145 instead.
1. Linear algebra: Strong knowledge of vectors and matrices with all related mathematical operations (addition, multiplication, inverse, projections ... etc).
2. Probability theory: Discrete and continuous univariate random variables. Bayes rule. Expectation, variance. Univariate Gaussian distribution.
3. Calculus: Functions of several variables. Partial differentiation. Multivariate maxima and minima.
4. Special functions: Log, Exp, Ln.
1. Python and/or Perl, and good knowledge in regular expressions
2. Shell commands (cat, sort, grep, sed, ...)
3. Additional programming language could be useful for course project.
Final course project would be in groups of 4-6 students. Working in a team for the project is a requirement.
Information for Visiting Students
|Pre-requisites||As above. No part time visiting students permitted.
|High Demand Course?
Course Delivery Information
|Academic year 2022/23, Available to all students (SV1)
|Course Start Date
|Learning and Teaching activities (Further Info)
Lecture Hours 18,
Supervised Practical/Workshop/Studio Hours 12,
Summative Assessment Hours 2,
Programme Level Learning and Teaching Hours 4,
Directed Learning and Independent Learning Hours
|Assessment (Further Info)
|Additional Information (Assessment)
Course Work 1 10%, individual work covers implementing basic search engine
Course Work 2 20%, individual work covering IR evaluation and web search
Course Work 3 40%, is a group project, where each group is 4-6 members
All of the coursework is heavy on system implementation, and thus being familiar with programming and software engineering is a pre-requisite. Python is required for implementation of Course Work 1 and Course Work 2. For Course Work 3, students are free to use the implementation language they prefer.
||Hours & Minutes
|Main Exam Diet S2 (April/May)||Text Technologies for Data Science (UG) (INFR11229)||2:00|
On completion of this course, the student will be able to:
- Build basic search engines from scratch, and use IR tools for searching massive collections of text documents
- Build feature extraction modules for text classification
- Implement evaluation scripts for IR and text classification
- Understand how web search engines (such as Google) work
- Work effectively in a team to produce working systems
|"Introduction to Information Retrieval", C.D. Manning, P. Raghavan and H. Schutze|
"Search Engines: Information Retrieval in Practice", W. Bruce Croft, Donald Metzler, Trevor Strohman
"Machine Learning in Automated Text Categorization". F Sebastiani "The Zipf Mystery"
Additional research papers and videos to be recommended during lectures
|Graduate Attributes and Skills
|Keywords||text processing,information retrieval,text classification
|Course organiser||Dr Walid Magdy
Tel: (0131 6)51 5612
|Course secretary||Mrs Helen Tweedale
Tel: (0131 6)50 3827