Undergraduate Course: Algorithmic Foundations of Data Science (INFR11156)
Course Outline
School  School of Informatics 
College  College of Science and Engineering 
Credit level (Normal year taken)  SCQF Level 11 (Year 4 Undergraduate) 
Availability  Available to all students 
SCQF Credits  10 
ECTS Credits  5 
Summary  The course aims to introduce algorithmic techniques that form the foundations of processing and analysing massive datasets of various forms. In particular, the course discusses how to preprocess massive datasets, efficiently store massive datasets, design fast algorithms for massive datasets, and analyse the performance of designed algorithms. Through various examples and the coursework, the students will see applications of the topics discussed in class in other areas of computer science, e.g., machine learning, and network science. 
Course description 
The course is to discuss algorithmic techniques that form the foundations of processing and analysing massive datasets of various forms. Specific techniques covered in the course include effective representation of datasets, extracting useful information from a dataset based on algebraic tools, designing faster algorithms based on sampling and sketching techniques. Students in class will learn these techniques through intuitions, theoretical reasoning, and practical examples.
The syllabus includes:
Highdimensional spaces
Bestfit subspaces and singular value decomposition
Spectral algorithms for massive datasets
Data streaming algorithms
Clustering
Graph sparsification

Entry Requirements (not applicable to Visiting Students)
Prerequisites 
It is RECOMMENDED that students have passed
Algorithms and Data Structures (INFR10052)

Corequisites  
Prohibited Combinations  
Other requirements  This course has the following mathematics prerequisites:
1 Calculus: limits, sums, integration, differentiation, recurrence relations
2 Graph theory: graphs, digraphs, trees
3 Probability: random variables, expectation, variance, Markov's inequality, Chebychev's inequality
4 Linear algebra: vectors, matrices, eigenvectors and eigenvalues, rank
5 Students should be familiar with the definition and use of bigO notation, and must be comfortable both reading and constructing mathematical proofs using various methods such as proof by induction and proof by contradiction. 
Information for Visiting Students
Prerequisites  This course has the following mathematics prerequisites:
1 Calculus: limits, sums, integration, differentiation, recurrence relations
2 Graph theory: graphs, digraphs, trees
3 Probability: random variables, expectation, variance, Markov's inequality, Chebychev's inequality
4 Linear algebra: vectors, matrices, eigenvectors and eigenvalues, rank
5 Students should be familiar with the definition and use of bigO notation, and must be comfortable both reading and constructing mathematical proofs using various methods such as proof by induction and proof by contradiction. 
High Demand Course? 
Yes 
Course Delivery Information

Academic year 2018/19, Available to all students (SV1)

Quota: None 
Course Start 
Semester 1 
Timetable 
Timetable 
Learning and Teaching activities (Further Info) 
Total Hours:
100
(
Lecture Hours 15,
Seminar/Tutorial Hours 5,
Feedback/Feedforward Hours 8,
Summative Assessment Hours 2,
Programme Level Learning and Teaching Hours 2,
Directed Learning and Independent Learning Hours
68 )

Assessment (Further Info) 
Written Exam
75 %,
Coursework
25 %,
Practical Exam
0 %

Additional Information (Assessment) 
The course assessment consists of a written exam, and a course work.
The written exam is to test a students understanding about the algorithms design and analysis techniques discussed in class, as well as a students ability to apply the learned techniques to design and analyse new algorithms. This corresponds to the Intended Learning Outcomes 14.
The coursework is to test a students ability to solve more complicated algorithmic problems occurring in practice, and use an appropriate software to analyse massive datasets. This corresponds to the Intended Learning Outcomes 35.
Written Exam = 75%
Practical Exam = 0%
Coursework = 25% 
Feedback 
A sample solution of the coursework will be released one week after the coursework's deadline. In addition to the feedback of the coursework, we will provide students with solutions of the exercise questions proposed in class or listed in the main reference book. We will also provide students with 1hour dropin session every week to answer students questions related the content of every weeks lectures. 
Exam Information 
Exam Diet 
Paper Name 
Hours & Minutes 

Main Exam Diet S1 (December)   2:00  
Learning Outcomes
On completion of this course, the student will be able to:
 Demonstrate familiarity with fundamentals for processing massive datasets.
 Describe and compare the various algorithmic design techniques covered in the syllabus to process massive datasets.
 Apply the learned techniques to design efficient algorithms for massive datasets.
 Apply basic knowledge in linear algebra and probability theory to prove the efficiency of the designed algorithm.
 Use an appropriate software to solve certain algorithmic problems for a given dataset.

Reading List
The main textbook for the course is:
Avrim Blum, John Hopcroft, and Ravindran Kannan: Foundations of Data Science.
https://www.cs.cornell.edu/jeh/book.pdf 
Additional Information
Graduate Attributes and Skills 
As the outcome of the course, a student should be able to apply the learned mathematical knowledge to analyse and process massive datasets, and use these tools to solve algorithmic problems occurring in practice. 
Keywords  Machine Learning,Computer Science,Artificial Intelligence,Theoretical Computer Science 
Contacts
Course organiser  Dr He Sun
Tel: (0131 6)51 5622
Email: H.Sun@ed.ac.uk 
Course secretary  Mr Gregor Hall
Tel: (0131 6)50 5194
Email: gregor.hall@ed.ac.uk 

