THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2024/2025

Timetable information in the Course Catalogue may be subject to change.

University Homepage

DRPS : Course Catalogue : School of Informatics : Informatics

Undergraduate Course: Programming for Data Science at Scale (INFR11255)

Course Outline
School	School of Informatics	College	College of Science and Engineering
Credit level (Normal year taken)	SCQF Level 11 (Year 4 Undergraduate)	Availability	Available to all students
SCQF Credits	10	ECTS Credits	5
Summary	The Programming for Data Science at Scale course will utilise the paradigms of programming at large scale to equip students with the practical skills required to leverage large-scale computational resources across a distributed cluster of computers.
Course description	Delivery Method: The course will be delivered through a combination of: (1) live lectures, (2) practical labs, (3) tutorials, and (4) an online discussion forum (Piazza forum). Content/Syllabus: The course will vary slightly from year to year, but will include many of the following topics: - Introduction to large-scale data processing - Data-parallel programming: functional collections - Distributed Data-parallel programming - Distributed Key-value processing - Optimizing distributed data processing: Shuffling and partitioning - Distributed Query processing - Distributed Graph processing - Distributed Tensor processing As this is a practical course touching a large number of topics and from separate areas, it is coursework only. For proper evaluation, students must be presented with real problems, rather than "toy" ones which can be solved in a very limited time. The evaluation is based on the following components: 1) Quizzes - learning outcomes 1, 2, 3. 2) Programming assignment - learning outcome 2. 3) Design assignment - learning outcomes 1, 3, 4.

Entry Requirements (not applicable to Visiting Students)
Pre-requisites		Co-requisites
Prohibited Combinations		Other requirements	None

Information for Visiting Students
Pre-requisites	The nature of this course means that assessment is only possible while the course is running. Any students entitled to a resit (e.g., visiting students, resits for professional purposes, ordinary degree students, or students with null sits) would need to retake the course in the following academic year.
High Demand Course?	Yes

Course Delivery Information

Academic year 2024/25, Available to all students (SV1)		Quota: None
Course Start	Semester 1
Timetable	Timetable
Learning and Teaching activities (Further Info)	Total Hours: 100 ( Lecture Hours 18, Seminar/Tutorial Hours 4, Supervised Practical/Workshop/Studio Hours 4, Feedback/Feedforward Hours 2, Programme Level Learning and Teaching Hours 2, Directed Learning and Independent Learning Hours 70 )
Assessment (Further Info)	Written Exam 0 %, Coursework 100 %, Practical Exam 0 %
Feedback	The feedback provided to the students will be in various forms: (1) Q&A over the online forum, (2) self-feedback from auto-graded quizzes and programming assignment, (3) collective feedback for programming assignment, (4) feed-forward during the tutorial session for the design assignment.
No Exam Information

Learning Outcomes
On completion of this course, the student will be able to: demonstrate an understanding of the concepts behind different large-scale programming models and their associated data models. construct and justify a formulation in terms of a programming model for a given problem and implement that formulation on top of an existing framework. identify how to decompose large problems into sub-problems and compose the results by applying appropriate programming models. present implementations and engage in professional dialogue with peers to identify and adapt those implementations better to meet requirements.

Reading List
The course will be self-contained with no required books. A list of resources: - Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113. - Zaharia, Matei, et al. "Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing." 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 2012. - Armbrust, Michael, et al. "Spark sql: Relational data processing in spark." Proceedings of the 2015 ACM SIGMOD international conference on management of data. 2015. - Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 2010.

Additional Information
Course URL	https://opencourse.inf.ed.ac.uk/pdss
Graduate Attributes and Skills	Knowledge integration: This course will help students understand different types of data and programming models. Problem solving: The students will develop their problem-solving skills by formulating a given problem in terms of an appropriate programming model. Applying and critiquing: The students will learn how to implement a formulation of a given problem on top of an existing framework for a given problem. Critical and analytical thinking: The students will learn to identify how to 'break down' large-scale problems into discrete problems and compose the results using the appropriate model. They will also gain experience in profiling and tuning an existing implementation to better meet demands / requirements.
Keywords	Large-scale programming,Functional programming,Query processing,Graph processing,Tensor processing

Contacts
Course organiser	Dr Amir Shaikhha Tel: (0131 6)50 4379 Email: amir.shaikhha@ed.ac.uk	Course secretary	Miss Yesica Marco Azorin Tel: (0131 6)50 5194 Email: ymarcoa@ed.ac.uk

Navigation

Help & Information

Search DPTs and Courses

Regulations

Degree Programmes

Courses

Humanities and Social Science

Science and Engineering

Medicine and Veterinary Medicine

Other Information

Combined Course Timetable

Important Information