THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2024/2025

Timetable information in the Course Catalogue may be subject to change.

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Informatics : EPCD Online

Postgraduate Course: Fundamentals of HPC System Administration (EPCD11021)

Course Outline
SchoolSchool of Informatics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 11 (Postgraduate)
Course typeOnline Distance Learning AvailabilityNot available to visiting students
SCQF Credits10 ECTS Credits5
SummaryHigh Performance Computing (HPC) is a multidisciplinary field combining complex computer architectures, system software, parallel programming languages, algorithms, tools and scientific applications. It focuses on solving computationally intensive problems in parallel by distributing them across large number of processing units. The state-of-the-art HPC systems are highly heterogenous (including CPUs, GPUs and other compute components) and are used by a wide range of scientific communities.

This course covers the basics of HPC system administration, focusing on elements that allow those massively parallel systems to be used effectively by many users with different requirements at the same time. Concepts covered include HPC networks and interconnects, parallel file systems, scheduling and queue management, and software and user environments.
Course description This module will be taught by administrators with experience across multiple Tier-1 and Tier-2 National HPC resources, in a direct and practical manner implementing concepts as they are taught and as they are used in live environments. Material will be made available during the course for self-paced learning and supplemental sessions where possible with guest lectures from vendors and other specialists. Interactive sessions of the course will focus on practical and facilitate discussion of coursework with experts in their field.


The course covers:

-HPC system administration from a professional perspective.

-Implementing open-source technologies to deploy a functional HPC cluster using virtual resources.

-Sharing developed best practices and lessons learned from hands-on experience operating multiple national HPC services.

-Approaches for investigating technical issues.

-Understanding functions of cluster management (incl. overview of underpinning technologies).

-Deployment, configuration, and management of: Authentication service; Parallel filesystems; Scheduler and resource manager; Monitoring and logging solutions

-Deployment, and configuration of user access and environment customisation.

-Understanding of different performant network interconnect designs and technologies.

-Development of scheduling and queue management strategies for efficient workload management.

-Using Infrastructure as Code (IaC) practices for configuration management.

-Usage of automation to improve and maintain services and user experience.

-A high-level overview of service management processes including change enablement and documenting standard operating procedures.


Students will demonstrate the learning outcomes by building a virtual HPC cluster which will require a parallel filesystem, centralised authentication, scheduler, and user access host.
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Co-requisites
Prohibited Combinations Other requirements None
Course Delivery Information
Academic year 2024/25, Not available to visiting students (SS1) Quota:  None
Course Start Semester 2
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 100 ( Online Activities 30, Programme Level Learning and Teaching Hours 2, Directed Learning and Independent Learning Hours 68 )
Assessment (Further Info) Written Exam 0 %, Coursework 100 %, Practical Exam 0 %
Additional Information (Assessment) Coursework (100%) : The coursework will require the student to design, implement and perform an evaluation of a multi-node compute cluster with access to a shared file system under the control of a queue management system and provide software environments for a set of identified user groups.

The student will be required to submit a design for their cluster, implement the design, submit the cluster for acceptance tests and evaluate the cluster performance and implementation. This will be supplemented by a reflection on how the student approached the scenario.
Feedback Not entered
No Exam Information
Learning Outcomes
On completion of this course, the student will be able to:
  1. Understand and evaluate the technologies underpinning a HPC compute resource.
  2. Implement and evaluate parallel file systems in a compute resource.
  3. Design and analyse workload scheduling and queue management strategies and technologies.
  4. Create, execute, and analyse processes required to enable user environments and software management.
  5. Execute core system administration concepts in a professional HPC setting.
Reading List
None
Additional Information
Graduate Attributes and Skills Not entered
Keywordssystems,cluster,performance,file system,queue,software environment,network,HPC,administration
Contacts
Course organiserMs Weronika Filinger
Tel: (0131 6) 50 5908
Email: w.filinger@epcc.ed.ac.uk
Course secretaryMr James Richards
Tel: 90131 6)51 3578
Email: J.Richards@epcc.ed.ac.uk
Navigation
Help & Information
Home
Introduction
Glossary
Search DPTs and Courses
Regulations
Regulations
Degree Programmes
Introduction
Browse DPTs
Courses
Introduction
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Prospectuses
Important Information