Timetable information in the Course Catalogue may be subject to change.

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Mathematics : Mathematics

Postgraduate Course: Extended Statistical Programming (MATH11242)

Course Outline
SchoolSchool of Mathematics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 11 (Postgraduate) AvailabilityAvailable to all students
SCQF Credits20 ECTS Credits10
SummaryThe course covers the fundamentals of Statistical Programming,
using the R language for practical work. The aims are
1. To teach good programming practice: design, structure, documentation/commenting, testing, debugging, version control and reproducibility.
2. To teach the key programming skills and methods required for statistics and data science. These are stochastic simulation, visualization, data handling, matrix computation and linear modelling, likelihood and optimization, bootstrapping and Bayesian stochastic simulation.
Course description This course is designed for MSc students on the Statistics with Data Science MSc. It prepares students for the practical computational aspects of the MSc and future work in Statistics and Data Science. The aim for students to learn structured reproducible programming using the R statistical computing language, and to acquire a basic skill set in the core elements of statistical computing.

The outline content of syllabus is:

git and github
Setting up a repo on github; using git; making a local working copy of the repo; modifying work and synchronising with the github repo; simple work cycle; simple conflict resolution; adding and deleting files with git; more advanced use

Programming for statistical data analysis
- basic principles
- what is programming
- what makes an analysis statistical

Getting started with R
A first R session; dissecting a simple programming example; data are not always numbers

A more systematic look at R
Objects, classes, attributes; data structures; attributes; operators; loops and conditional execution; functions; '...' in R; pipes; planning and coding; vectorization; useful built in functions; apply.

Simulation I
Random sampling building blocks: sampling data, sampling from distributions; simulation from stochastic models; statistical simulation studies.

Reading and storing data in files
Working directories; reading code; reading and writing text data; reading and writing binary files; reading from other sources.

Data re-arrangement and tidy data
Concept of tidy data; data tidying; regular expressions.

Statistical Modelling: linear models
Linear model as a prototype statistical model; basic model concepts; interactions; computing with linear models; model matrix;
model formulae; fitting linear models in R.

R classes
Concepts of classes and object orientation; S3 methods in R Matrix computation Ordering and efficiency; general solution of linear systems; Cholesky, forward and backsolve; QR; Pivoting and triangular factorizations; Symmetric eigen decomposition and PCA; SVD.

Design, Debug, Test and Profile
Design before you code; worked example; testing; debugging and debuggers; profiling flops and memory.

Maximum Likelihood Estimation
Concept; statement of large sample results; what they mean.

Numerical Optimization
Newton's method; quasi-Newton; Nelder Mead; R optimization functions; simple constraints on parameters; getting derivatives.

Systematic look at graphics already encountered; scatterplots ggplot and base R; plotting to file; univariate plots; boxplots etc; 3D plotting.

Simulation II: simulation for inference
Bootstrapping: generating from distributions and the nonparametric bootstrap, bootstrapping multivariate data; MCMC for Bayesian inference; Metropolis Hastings; Gibbs; Graphical model, DAGs and automatic Gibbs; JAGS.

R markdown
Basic overview of R markdown for reproducible data analysis.
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Co-requisites
Prohibited Combinations Students MUST NOT also be taking Statistical Programming (MATH11176)
Other requirements * This course is only available to students on an Mathematics MSc or MSc Data Science (Informatics)*
Information for Visiting Students
High Demand Course? Yes
Course Delivery Information
Academic year 2023/24, Available to all students (SV1) Quota:  None
Course Start Semester 1
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 200 ( Lecture Hours 22, Supervised Practical/Workshop/Studio Hours 33, Summative Assessment Hours 60, Programme Level Learning and Teaching Hours 4, Directed Learning and Independent Learning Hours 81 )
Assessment (Further Info) Written Exam 0 %, Coursework 100 %, Practical Exam 0 %
Additional Information (Assessment) Coursework : 100%«br /»
Examination : 0%
Feedback Not entered
No Exam Information
Learning Outcomes
On completion of this course, the student will be able to:
  1. Write reasonably efficient, well structured and documented computer programs in R.
  2. Write efficient implementations of statistical methods.
  3. Be able to process data effectively, in particular preparing data for analysis and visualizing data.
  4. Show appreciation of reliable and reproducible computational methods, and the nature of a statistical analysis.
  5. Demonstrate expertise in commonly used statistical computing methods.
Reading List
Advanced R. Hadley Wickam
Core Statistics. Simon Wood
Additional Information
Graduate Attributes and Skills Not entered
KeywordsExtended,Statistics,programming,data science
Course organiserProf Simon Wood
Course secretaryMr Jack Draper
Help & Information
Search DPTs and Courses
Degree Programmes
Browse DPTs
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Important Information