Undergraduate Course: Programming for Data Science at Scale (INFR11255)
Course Outline
School | School of Informatics |
College | College of Science and Engineering |
Credit level (Normal year taken) | SCQF Level 11 (Year 4 Undergraduate) |
Availability | Available to all students |
SCQF Credits | 10 |
ECTS Credits | 5 |
Summary | The Programming for Data Science at Scale course will utilise the paradigms of programming at large scale to equip students with the practical skills required to leverage large-scale
computational resources across a distributed cluster of computers. |
Course description |
Delivery Method:
The course will be delivered through a combination of: (1) live lectures, (2) practical labs, (3) tutorials, and (4) an online discussion forum (Piazza forum).
Content/Syllabus:
The course will vary slightly from year to year, but will include many of the following topics:
- Introduction to large-scale data processing
- Data-parallel programming: functional collections
- Distributed Data-parallel programming
- Distributed Key-value processing
- Optimizing distributed data processing: Shuffling and partitioning
- Distributed Query processing
- Distributed Graph processing
- Distributed Tensor processing
|
Entry Requirements (not applicable to Visiting Students)
Pre-requisites |
|
Co-requisites | |
Prohibited Combinations | |
Other requirements | None |
Information for Visiting Students
Pre-requisites | Same as "other requirements" |
High Demand Course? |
Yes |
Course Delivery Information
|
Academic year 2024/25, Available to all students (SV1)
|
Quota: None |
Course Start |
Semester 1 |
Timetable |
Timetable |
Learning and Teaching activities (Further Info) |
Total Hours:
100
(
Lecture Hours 18,
Seminar/Tutorial Hours 4,
Supervised Practical/Workshop/Studio Hours 4,
Feedback/Feedforward Hours 2,
Programme Level Learning and Teaching Hours 2,
Directed Learning and Independent Learning Hours
70 )
|
Assessment (Further Info) |
Written Exam
0 %,
Coursework
100 %,
Practical Exam
0 %
|
Feedback |
The feedback provided to the students will be in various forms: (1) Q&A over the online forum, (2) self-feedback from auto-graded quizzes and programming assignment, (3) collective feedback for programming assignment, (4) feed-forward during the tutorial session for the design assignment. |
No Exam Information |
Learning Outcomes
On completion of this course, the student will be able to:
- Demonstrate an understanding of the concepts behind different large-scale programming models and their associated data models.
- Construct and justify a formulation in terms of a programming model for a given problem and implement that formulation on top of an existing framework.
- Identify how to decompose large problems into sub-problems and compose the results by applying appropriate programming models.
- Present implementations and engage in professional dialogue with peers to identify and adapt those implementations better to meet requirements.
|
Reading List
The course will be self-contained with no required books.
A list of resources:
¿ Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
¿ Zaharia, Matei, et al. "Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing." 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 2012.
¿ Armbrust, Michael, et al. "Spark sql: Relational data processing in spark." Proceedings of the 2015 ACM SIGMOD international conference on management of data. 2015.
¿ Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 2010. |
Additional Information
Graduate Attributes and Skills |
Knowledge integration: This course will help students understand different types of data and programming models.
Problem solving: The students will develop their problem-solving skills by formulating a given problem in terms of an appropriate programming model.
Applying and critiquing: The students will learn how to implement a formulation of a given problem on top of an existing framework for a given problem.
Critical and analytical thinking: The students will learn to identify how to ¿break down¿ large-scale problems into discrete problems and compose the results using the appropriate model. They will also gain experience in profiling and tuning an existing implementation to better meet demands / requirements. |
Keywords | Large-scale programming,Functional programming,Query processing,Graph processing,Tensor processing |
Contacts
Course organiser | Dr Amir Shaikhha
Tel: (0131 6)50 4379
Email: amir.shaikhha@ed.ac.uk |
Course secretary | |
|
|