Postgraduate Course: Machine Learning at Scale (EPCC11013)
Course Outline
School | School of Informatics |
College | College of Science and Engineering |
Credit level (Normal year taken) | SCQF Level 11 (Postgraduate) |
Availability | Not available to visiting students |
SCQF Credits | 10 |
ECTS Credits | 5 |
Summary | This course aims to teaching the skills required to take machine learning, specifically deep neural networks (DDNs), from simple examples up to deployment on very large datasets or models at scale. It looks at the implementation and optimization of large scale machine learning solutions, considering both training and inference functions, and targeting high performance hardware such as GPUs and TPUs. The course will consider both utilizing common machine learning frameworks and writing standalone implementations. |
Course description |
Machine learning occurs at a range of scales, from small scale networks with small datasets or parameter sizes, through to extremely large networks with millions of parameters and datasets of terabyte sizes. Machine learning also has two very distinct phases of operation; training and inference. To enable efficient and quick machine learning exploitation when working with very large networks or very large datasets, powerful computing hardware is required. When using significant amounts of computational hardware, there are challenges in ensuring that applications run efficiently and effectively at scale.
This course will provide the practical skills and knowledge require to run machine learning on large-scale HPC systems and hardware to deliver trained models and inference as quickly and efficiently as possible. We will work with a range of common machine learning frameworks, examining how to run them efficiently in parallel and on a range of hardware. We will also utilize parallel programming skills to develop our own implementations of machine learning functionality and augment existing framework solutions, where appropriate. The course will evaluate a range of real-world examples where researchers have scaled machine learning up to very large computers, and learn from the current state of the art for distributed machine learning training and inference.
|
Entry Requirements (not applicable to Visiting Students)
Pre-requisites |
Students MUST have passed:
Message-Passing Programming (EPCC11002) AND
Threaded Programming (EPCC11003) It is RECOMMENDED that students have passed
HPC Architectures (EPCC11004)
|
Co-requisites | Students MUST also take:
Accelerated Systems: Principles and Practice (EPCC11020)
|
Prohibited Combinations | |
Other requirements | Ability to program in Python, C, or C++ or equivalent. Students must have familiarity with GPU programming (equivalent to what is provided in Advanced Parallel Techniques).
Students will need to have an appreciation of HPC concepts (e.g. MPI, OpenMP, and system architectures such as that provided by the recommended prerequisite courses).
Students will need to be familiar with the theory underpinning Machine Learning approaches. |
Course Delivery Information
|
Academic year 2024/25, Not available to visiting students (SS1)
|
Quota: 35 |
Course Start |
Semester 2 |
Timetable |
Timetable |
Learning and Teaching activities (Further Info) |
Total Hours:
100
(
Lecture Hours 20,
Supervised Practical/Workshop/Studio Hours 10,
Feedback/Feedforward Hours 1,
Formative Assessment Hours 2,
Summative Assessment Hours 2,
Revision Session Hours 1,
Programme Level Learning and Teaching Hours 2,
Directed Learning and Independent Learning Hours
62 )
|
Assessment (Further Info) |
Written Exam
50 %,
Coursework
50 %,
Practical Exam
0 %
|
Additional Information (Assessment) |
The course will be assess using a coursework (amounting to 50% of the overall marks), which will explore scaling applications on high performance computing systems, and through an exam (amounting to 50% of the overall marks), exploring the understanding of the taught material and its application to example scenarios. |
Feedback |
Provided via practical classes and on assessed work. |
Exam Information |
Exam Diet |
Paper Name |
Hours & Minutes |
|
Main Exam Diet S2 (April/May) | Machine Learning at Scale | 2:00 | |
Learning Outcomes
On completion of this course, the student will be able to:
- Efficiently deploy machine learning on CPUs, GPUs, and other accelerators on a single node
- Understand impacts on I/O for training and inference systems and how to efficiently exploit parallel filesystems
- Diagnose and mitigate bottlenecks in scaling machine learning up to large dataset or large numbers of nodes and computational resources
- Develop custom machine learning applications
- Efficiently exploit and evaluate pre-existing machine learning frameworks
|
Reading List
Provided via Learn/Leganto |
Additional Information
Graduate Attributes and Skills |
Problem solving and analytical thinking
Knowledge integration
Planning and time management
Situational awareness |
Keywords | HPC,Machine Learning,Parallelism,Deep Neural Networks,Imaging and Vision,Data Science,Big Data |
Contacts
Course organiser | Mr William Jackson
Tel:
Email: Adrian.Jackson@ed.ac.uk |
Course secretary | Mr James Richards
Tel: 90131 6)51 3578
Email: J.Richards@epcc.ed.ac.uk |
|
|