Postgraduate Course: Machine Learning at Scale (EPCD11013)
Course Outline
School | School of Informatics |
College | College of Science and Engineering |
Credit level (Normal year taken) | SCQF Level 11 (Postgraduate) |
Course type | Online Distance Learning |
Availability | Not available to visiting students |
SCQF Credits | 10 |
ECTS Credits | 5 |
Summary | This course aims to teaching the skills required to take machine learning, specifically deep neural networks (DDNs), from simple examples up to deployment on very large datasets or models at scale. It looks at the implementation and optimization of large scale machine learning solutions, considering both training and inference functions, and targeting high performance hardware such as GPUs and TPUs. The course will consider both utilizing common machine learning frameworks and writing standalone implementations. |
Course description |
Machine learning occurs at a range of scales, from small scale networks with small datasets or parameter sizes, through to extremely large networks with millions of parameters and datasets of terabyte sizes. Machine learning also has two very distinct phases of operation; training and inference. To enable efficient and quick machine learning exploitation when working with very large networks or very large datasets, powerful computing hardware is required. When using significant amounts of computational hardware, there are challenges in ensuring that applications run efficiently and effectively at scale.
This course will provide the practical skills and knowledge require to run machine learning on large-scale HPC systems and hardware to deliver trained models and inference as quickly and efficiently as possible. We will work with a range of common machine learning frameworks, examining how to run them efficiently in parallel and on a range of hardware. We will also utilize parallel programming skills to develop our own implementations of machine learning functionality and augment existing framework solutions, where appropriate. The course will evaluate a range of real-world examples where researchers have scaled machine learning up to very large computers, and learn from the current state of the art for distributed machine learning training and inference.
|
Course Delivery Information
|
Academic year 2024/25, Not available to visiting students (SS1)
|
Quota: None |
Course Start |
Semester 2 |
Course Start Date |
13/01/2025 |
Timetable |
Timetable |
Learning and Teaching activities (Further Info) |
Total Hours:
100
(
Online Activities 30,
Programme Level Learning and Teaching Hours 2,
Directed Learning and Independent Learning Hours
68 )
|
Assessment (Further Info) |
Written Exam
0 %,
Coursework
100 %,
Practical Exam
0 %
|
Additional Information (Assessment) |
100% Coursework splt into two assignments:«br /»
1) Traditional Coursework exercise (50%) «br /»
2) Exam-style short answer questions to be taken over 1-2 weeks maximum (50%) |
Feedback |
Provided via live session discussion of material and practical exercises and on assessed work. |
No Exam Information |
Learning Outcomes
On completion of this course, the student will be able to:
- Efficiently deploy machine learning on CPUs, GPUs, and other accelerators on a single node
- Understand impacts on I/O for training and inference systems and how to efficiently exploit parallel filesystems
- Diagnose and mitigate bottlenecks in scaling machine learning up to large dataset or large numbers of nodes and computational resources
- Develop custom machine learning applications
- Efficiently exploit and evaluate pre-existing machine learning frameworks
|
Reading List
Provided via Learn/Leganto and live sessions based on discussion topics raised |
Additional Information
Graduate Attributes and Skills |
Problem solving and analytical thinking
Knowledge integration
Planning and time management
Situational awareness |
Special Arrangements |
This is the Online Learning version of on-campus course EPCC11013 Machine Learning at Scale. On-campus students should refer to that course. |
Keywords | HPC,Machine Learning,Parallelism,Deep Neural Networks,Imaging and Vision,Data Science,Big Data |
Contacts
Course organiser | Mr William Jackson
Tel:
Email: Adrian.Jackson@ed.ac.uk |
Course secretary | Mr James Richards
Tel: 90131 6)51 3578
Email: J.Richards@epcc.ed.ac.uk |
|
|