THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2024/2025

Timetable information in the Course Catalogue may be subject to change.

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Informatics : Informatics

Undergraduate Course: Programming for Data Science at Scale (INFR11255)

Course Outline
SchoolSchool of Informatics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 11 (Year 4 Undergraduate) AvailabilityAvailable to all students
SCQF Credits10 ECTS Credits5
SummaryThe Programming for Data Science at Scale course will utilise the paradigms of programming at large scale to equip students with the practical skills required to leverage large-scale
computational resources across a distributed cluster of computers.
Course description Delivery Method:
The course will be delivered through a combination of: (1) live lectures, (2) practical labs, (3) tutorials, and (4) an online discussion forum (Piazza forum).

Content/Syllabus:
The course will vary slightly from year to year, but will include many of the following topics:
- Introduction to large-scale data processing
- Data-parallel programming: functional collections
- Distributed Data-parallel programming
- Distributed Key-value processing
- Optimizing distributed data processing: Shuffling and partitioning
- Distributed Query processing
- Distributed Graph processing
- Distributed Tensor processing
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Co-requisites
Prohibited Combinations Other requirements None
Information for Visiting Students
Pre-requisitesSame as "other requirements"
High Demand Course? Yes
Course Delivery Information
Academic year 2024/25, Available to all students (SV1) Quota:  None
Course Start Semester 1
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 100 ( Lecture Hours 18, Seminar/Tutorial Hours 4, Supervised Practical/Workshop/Studio Hours 4, Feedback/Feedforward Hours 2, Programme Level Learning and Teaching Hours 2, Directed Learning and Independent Learning Hours 70 )
Assessment (Further Info) Written Exam 0 %, Coursework 100 %, Practical Exam 0 %
Feedback The feedback provided to the students will be in various forms: (1) Q&A over the online forum, (2) self-feedback from auto-graded quizzes and programming assignment, (3) collective feedback for programming assignment, (4) feed-forward during the tutorial session for the design assignment.
No Exam Information
Learning Outcomes
On completion of this course, the student will be able to:
  1. Demonstrate an understanding of the concepts behind different large-scale programming models and their associated data models.
  2. Construct and justify a formulation in terms of a programming model for a given problem and implement that formulation on top of an existing framework.
  3. Identify how to decompose large problems into sub-problems and compose the results by applying appropriate programming models.
  4. Present implementations and engage in professional dialogue with peers to identify and adapt those implementations better to meet requirements.
Reading List
The course will be self-contained with no required books.

A list of resources:
¿ Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
¿ Zaharia, Matei, et al. "Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing." 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 2012.
¿ Armbrust, Michael, et al. "Spark sql: Relational data processing in spark." Proceedings of the 2015 ACM SIGMOD international conference on management of data. 2015.
¿ Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 2010.
Additional Information
Graduate Attributes and Skills Knowledge integration: This course will help students understand different types of data and programming models.
Problem solving: The students will develop their problem-solving skills by formulating a given problem in terms of an appropriate programming model.
Applying and critiquing: The students will learn how to implement a formulation of a given problem on top of an existing framework for a given problem.
Critical and analytical thinking: The students will learn to identify how to ¿break down¿ large-scale problems into discrete problems and compose the results using the appropriate model. They will also gain experience in profiling and tuning an existing implementation to better meet demands / requirements.
KeywordsLarge-scale programming,Functional programming,Query processing,Graph processing,Tensor processing
Contacts
Course organiserDr Amir Shaikhha
Tel: (0131 6)50 4379
Email: amir.shaikhha@ed.ac.uk
Course secretary
Navigation
Help & Information
Home
Introduction
Glossary
Search DPTs and Courses
Regulations
Regulations
Degree Programmes
Introduction
Browse DPTs
Courses
Introduction
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Prospectuses
Important Information