Postgraduate Course: Extended Statistical Programming (MATH11242)
|School||School of Mathematics
||College||College of Science and Engineering
|Credit level (Normal year taken)||SCQF Level 11 (Postgraduate)
||Availability||Available to all students
|Summary||The course covers the fundamentals of Statistical Programming,
using the R language for practical work. The aims are
1. To teach good programming practice: design, structure, documentation/commenting, testing, debugging, version control and reproducibility.
2. To teach the key programming skills and methods required for statistics and data science. These are stochastic simulation, visualization, data handling, matrix computation and linear modelling, likelihood and optimization, bootstrapping and Bayesian stochastic simulation.
This course is designed for MSc students on the Statistics with Data Science MSc. It prepares students for the practical computational aspects of the MSc and future work in Statistics and Data Science. The aim for students to learn structured reproducible programming using the R statistical computing language, and to acquire a basic skill set in the core elements of statistical computing.
The outline content of syllabus is:
git and github
Setting up a repo on github; using git; making a local working copy of the repo; modifying work and synchronising with the github repo; simple work cycle; simple conflict resolution; adding and deleting files with git; more advanced use
Programming for statistical data analysis
- basic principles
- what is programming
- what makes an analysis statistical
Getting started with R
A first R session; dissecting a simple programming example; data are not always numbers
A more systematic look at R
Objects, classes, attributes; data structures; attributes; operators; loops and conditional execution; functions; '...' in R; pipes; planning and coding; vectorization; useful built in functions; apply.
Random sampling building blocks: sampling data, sampling from distributions; simulation from stochastic models; statistical simulation studies.
Reading and storing data in files
Working directories; reading code; reading and writing text data; reading and writing binary files; reading from other sources.
Data re-arrangement and tidy data
Concept of tidy data; data tidying; regular expressions.
Statistical Modelling: linear models
Linear model as a prototype statistical model; basic model concepts; interactions; computing with linear models; model matrix;
model formulae; fitting linear models in R.
Concepts of classes and object orientation; S3 methods in R Matrix computation Ordering and efficiency; general solution of linear systems; Cholesky, forward and backsolve; QR; Pivoting and triangular factorizations; Symmetric eigen decomposition and PCA; SVD.
Design, Debug, Test and Profile
Design before you code; worked example; testing; debugging and debuggers; profiling flops and memory.
Maximum Likelihood Estimation
Concept; statement of large sample results; what they mean.
Newton's method; quasi-Newton; Nelder Mead; R optimization functions; simple constraints on parameters; getting derivatives.
Systematic look at graphics already encountered; scatterplots ggplot and base R; plotting to file; univariate plots; boxplots etc; 3D plotting.
Simulation II: simulation for inference
Bootstrapping: generating from distributions and the nonparametric bootstrap, bootstrapping multivariate data; MCMC for Bayesian inference; Metropolis Hastings; Gibbs; Graphical model, DAGs and automatic Gibbs; JAGS.
Basic overview of R markdown for reproducible data analysis.
Entry Requirements (not applicable to Visiting Students)
|Prohibited Combinations|| Students MUST NOT also be taking
Statistical Programming (MATH11176)
||Other requirements|| * This course is only available to students on an Mathematics MSc or MSc Data Science (Informatics)*
Information for Visiting Students
|High Demand Course?
Course Delivery Information
|Academic year 2023/24, Available to all students (SV1)
|Learning and Teaching activities (Further Info)
Lecture Hours 22,
Supervised Practical/Workshop/Studio Hours 33,
Summative Assessment Hours 60,
Programme Level Learning and Teaching Hours 4,
Directed Learning and Independent Learning Hours
|Assessment (Further Info)
|Additional Information (Assessment)
||Coursework : 100%«br /»
Examination : 0%
|No Exam Information
On completion of this course, the student will be able to:
- Write reasonably efficient, well structured and documented computer programs in R.
- Write efficient implementations of statistical methods.
- Be able to process data effectively, in particular preparing data for analysis and visualizing data.
- Show appreciation of reliable and reproducible computational methods, and the nature of a statistical analysis.
- Demonstrate expertise in commonly used statistical computing methods.
|Advanced R. Hadley Wickam|
Core Statistics. Simon Wood
|Graduate Attributes and Skills
|Course organiser||Prof Simon Wood
|Course secretary||Mr Jack Draper