Postgraduate Course: Extended Statistical Programming (MATH11242)
Course Outline
School  School of Mathematics 
College  College of Science and Engineering 
Credit level (Normal year taken)  SCQF Level 11 (Postgraduate) 
Availability  Not available to visiting students 
SCQF Credits  20 
ECTS Credits  10 
Summary  The course covers the fundamentals of Statistical Programming,
using the R language for practical work. The aims are
1. To teach good programming practice: design, structure, documentation/commenting, testing, debugging, version control and reproducibility.
2. To teach the key programming skills and methods required for statistics and data science. These are stochastic simulation, visualization, data handling, matrix computation and linear modelling, likelihood and optimization, bootstrapping and Bayesian stochastic simulation. 
Course description 
This course is designed for MSc students on the Statistics with Data Science MSc. It prepares students for the practical computational aspects of the MSc and future work in Statistics and Data Science. The aim for students to learn structured reproducible programming using the R statistical computing language, and to acquire a basic skill set in the core elements of statistical computing.
The outline content of syllabus is:
git and github
Setting up a repo on github; using git; making a local working copy of the repo; modifying work and synchronising with the github repo; simple work cycle; simple conflict resolution; adding and deleting files with git; more advanced use
Programming for statistical data analysis
 basic principles
 what is programming
 what makes an analysis statistical
Getting started with R
A first R session; dissecting a simple programming example; data are not always numbers
A more systematic look at R
Objects, classes, attributes; data structures; attributes; operators; loops and conditional execution; functions; '...' in R; pipes; planning and coding; vectorization; useful built in functions; apply.
Simulation I
Random sampling building blocks: sampling data, sampling from distributions; simulation from stochastic models; statistical simulation studies.
Reading and storing data in files
Working directories; reading code; reading and writing text data; reading and writing binary files; reading from other sources.
Data rearrangement and tidy data
Concept of tidy data; data tidying; regular expressions.
Statistical Modelling: linear models
Linear model as a prototype statistical model; basic model concepts; interactions; computing with linear models; model matrix;
model formulae; fitting linear models in R.
R classes
Concepts of classes and object orientation; S3 methods in R Matrix computation Ordering and efficiency; general solution of linear systems; Cholesky, forward and backsolve; QR; Pivoting and triangular factorizations; Symmetric eigen decomposition and PCA; SVD.
Design, Debug, Test and Profile
Design before you code; worked example; testing; debugging and debuggers; profiling flops and memory.
Maximum Likelihood Estimation
Concept; statement of large sample results; what they mean.
Numerical Optimization
Newton's method; quasiNewton; Nelder Mead; R optimization functions; simple constraints on parameters; getting derivatives.
Graphics
Systematic look at graphics already encountered; scatterplots ggplot and base R; plotting to file; univariate plots; boxplots etc; 3D plotting.
Simulation II: simulation for inference
Bootstrapping: generating from distributions and the nonparametric bootstrap, bootstrapping multivariate data; MCMC for Bayesian inference; Metropolis Hastings; Gibbs; Graphical model, DAGs and automatic Gibbs; JAGS.
R markdown
Basic overview of R markdown for reproducible data analysis.

Entry Requirements (not applicable to Visiting Students)
Prerequisites 

Corequisites  
Prohibited Combinations  Students MUST NOT also be taking
Statistical Programming (MATH11176)

Other requirements  * This course is only available to students on an Mathematics MSc or MSc Data Science (Informatics)*
Note that PGT students on School of Mathematics MSc programmes are not required to have taken prerequisite courses, but they are advised to check that they have studied the material covered in the syllabus of each prerequisite course before enrolling. 
Course Delivery Information

Academic year 2024/25, Not available to visiting students (SS1)

Quota: None 
Course Start 
Semester 1 
Timetable 
Timetable 
Learning and Teaching activities (Further Info) 
Total Hours:
200
(
Lecture Hours 22,
Supervised Practical/Workshop/Studio Hours 33,
Summative Assessment Hours 60,
Programme Level Learning and Teaching Hours 4,
Directed Learning and Independent Learning Hours
81 )

Assessment (Further Info) 
Written Exam
0 %,
Coursework
100 %,
Practical Exam
0 %

Additional Information (Assessment) 
Coursework : 100%«br /»
Examination : 0% 
Feedback 
Not entered 
No Exam Information 
Learning Outcomes
On completion of this course, the student will be able to:
 Write reasonably efficient, well structured and documented computer programs in R.
 Write efficient implementations of statistical methods.
 Be able to process data effectively, in particular preparing data for analysis and visualizing data.
 Show appreciation of reliable and reproducible computational methods, and the nature of a statistical analysis.
 Demonstrate expertise in commonly used statistical computing methods.

Reading List
Advanced R. Hadley Wickam
Core Statistics. Simon Wood 
Additional Information
Graduate Attributes and Skills 
Not entered 
Keywords  Extended,Statistics,programming,data science 
Contacts
Course organiser  Prof Simon Wood
Tel:
Email: simon.wood@ed.ac.uk 
Course secretary  Mr Jack Draper
Tel:
Email: v1jdrape@ed.ac.uk 

