Timetable information in the Course Catalogue may be subject to change.

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Informatics : EPCC on-campus

Postgraduate Course: Machine Learning at Scale (EPCC11013)

Course Outline
SchoolSchool of Informatics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 11 (Postgraduate) AvailabilityNot available to visiting students
SCQF Credits10 ECTS Credits5
SummaryThis course aims to teaching the skills required to take machine learning, specifically deep neural networks (DDNs), from simple examples up to deployment on very large datasets or models at scale. It looks at the implementation and optimization of large scale machine learning solutions, considering both training and inference functions, and targeting high performance hardware such as GPUs and TPUs. The course will consider both utilizing common machine learning frameworks and writing standalone implementations.
Course description Machine learning occurs at a range of scales, from small scale networks with small datasets or parameter sizes, through to extremely large networks with millions of parameters and datasets of terabyte sizes. Machine learning also has two very distinct phases of operation; training and inference. To enable efficient and quick machine learning exploitation when working with very large networks or very large datasets, powerful computing hardware is required. When using significant amounts of computational hardware, there are challenges in ensuring that applications run efficiently and effectively at scale.

This course will provide the practical skills and knowledge require to run machine learning on large-scale HPC systems and hardware to deliver trained models and inference as quickly and efficiently as possible. We will work with a range of common machine learning frameworks, examining how to run them efficiently in parallel and on a range of hardware. We will also utilize parallel programming skills to develop our own implementations of machine learning functionality and augment existing framework solutions, where appropriate. The course will evaluate a range of real-world examples where researchers have scaled machine learning up to very large computers, and learn from the current state of the art for distributed machine learning training and inference.
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Students MUST have passed: Message-Passing Programming (EPCC11002) AND Threaded Programming (EPCC11003)
It is RECOMMENDED that students have passed HPC Architectures (EPCC11004)
Co-requisites Students MUST also take: Advanced Parallel Techniques (EPCC11011)
Prohibited Combinations Other requirements Ability to program in Python, C, or C++ or equivalent. Students must have familiarity with GPU programming (equivalent to what is provided in Advanced Parallel Techniques).

Students will need to have an appreciation of HPC concepts (e.g. MPI, OpenMP, and system architectures such as that provided by the recommended prerequisite courses).

Students will need to be familiar with the theory underpinning Machine Learning approaches.
Course Delivery Information
Academic year 2024/25, Not available to visiting students (SS1) Quota:  84
Course Start Semester 2
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 100 ( Lecture Hours 20, Supervised Practical/Workshop/Studio Hours 10, Feedback/Feedforward Hours 1, Formative Assessment Hours 2, Summative Assessment Hours 2, Revision Session Hours 1, Programme Level Learning and Teaching Hours 2, Directed Learning and Independent Learning Hours 62 )
Assessment (Further Info) Written Exam 50 %, Coursework 50 %, Practical Exam 0 %
Additional Information (Assessment) The course will be assess using a coursework (amounting to 50% of the overall marks), which will explore scaling applications on high performance computing systems, and through an exam (amounting to 50% of the overall marks), exploring the understanding of the taught material and its application to example scenarios.
Feedback Provided via practical classes and on assessed work.
Exam Information
Exam Diet Paper Name Hours & Minutes
Main Exam Diet S2 (April/May)Machine Learning at Scale2:00
Learning Outcomes
On completion of this course, the student will be able to:
  1. Efficiently deploy machine learning on CPUs, GPUs, and other accelerators on a single node
  2. Understand impacts on I/O for training and inference systems and how to efficiently exploit parallel filesystems
  3. Diagnose and mitigate bottlenecks in scaling machine learning up to large dataset or large numbers of nodes and computational resources
  4. Develop custom machine learning applications
  5. Efficiently exploit and evaluate pre-existing machine learning frameworks
Reading List
Provided via Learn/Leganto
Additional Information
Graduate Attributes and Skills Problem solving and analytical thinking

Knowledge integration

Planning and time management

Situational awareness
KeywordsHPC,Machine Learning,Parallelism,Deep Neural Networks,Imaging and Vision,Data Science,Big Data
Course organiserMr William Jackson
Course secretaryMr James Richards
Tel: 90131 6)51 3578
Help & Information
Search DPTs and Courses
Degree Programmes
Browse DPTs
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Important Information