Postgraduate Course: Extended Statistical Programming (MATH11242)
Course Outline
School | School of Mathematics |
College | College of Science and Engineering |
Credit level (Normal year taken) | SCQF Level 11 (Postgraduate) |
Availability | Not available to visiting students |
SCQF Credits | 20 |
ECTS Credits | 10 |
Summary | The course covers the fundamentals of Statistical Programming,
using the R language for practical work. The aims are
1. To teach good programming practice: design, structure, documentation/commenting, testing, debugging, version control and reproducibility.
2. To teach the key programming skills and methods required for statistics and data science. These are stochastic simulation, visualization, data handling, matrix computation and linear modelling, likelihood and optimization, bootstrapping and Bayesian stochastic simulation. |
Course description |
This course is designed for MSc students on the Statistics with Data Science MSc. It prepares students for the practical computational aspects of the MSc and future work in Statistics and Data Science. The aim for students to learn structured reproducible programming using the R statistical computing language, and to acquire a basic skill set in the core elements of statistical computing.
The outline content of syllabus is:
git and github
Setting up a repo on github; using git; making a local working copy of the repo; modifying work and synchronising with the github repo; simple work cycle; simple conflict resolution; adding and deleting files with git; more advanced use
Programming for statistical data analysis
- basic principles
- what is programming
- what makes an analysis statistical
Getting started with R
A first R session; dissecting a simple programming example; data are not always numbers
A more systematic look at R
Objects, classes, attributes; data structures; attributes; operators; loops and conditional execution; functions; '...' in R; pipes; planning and coding; vectorization; useful built in functions; apply.
Simulation I
Random sampling building blocks: sampling data, sampling from distributions; simulation from stochastic models; statistical simulation studies.
Reading and storing data in files
Working directories; reading code; reading and writing text data; reading and writing binary files; reading from other sources.
Data re-arrangement and tidy data
Concept of tidy data; data tidying; regular expressions.
Statistical Modelling: linear models
Linear model as a prototype statistical model; basic model concepts; interactions; computing with linear models; model matrix;
model formulae; fitting linear models in R.
R classes
Concepts of classes and object orientation; S3 methods in R Matrix computation Ordering and efficiency; general solution of linear systems; Cholesky, forward and backsolve; QR; Pivoting and triangular factorizations; Symmetric eigen decomposition and PCA; SVD.
Design, Debug, Test and Profile
Design before you code; worked example; testing; debugging and debuggers; profiling flops and memory.
Maximum Likelihood Estimation
Concept; statement of large sample results; what they mean.
Numerical Optimization
Newton's method; quasi-Newton; Nelder Mead; R optimization functions; simple constraints on parameters; getting derivatives.
Graphics
Systematic look at graphics already encountered; scatterplots ggplot and base R; plotting to file; univariate plots; boxplots etc; 3D plotting.
Simulation II: simulation for inference
Bootstrapping: generating from distributions and the nonparametric bootstrap, bootstrapping multivariate data; MCMC for Bayesian inference; Metropolis Hastings; Gibbs; Graphical model, DAGs and automatic Gibbs; JAGS.
R markdown
Basic overview of R markdown for reproducible data analysis.
|
Entry Requirements (not applicable to Visiting Students)
Pre-requisites |
|
Co-requisites | |
Prohibited Combinations | Students MUST NOT also be taking
Statistical Programming (MATH11176)
|
Other requirements | * This course is only available to students on an Mathematics MSc or MSc Data Science (Informatics)*
Note that PGT students on School of Mathematics MSc programmes are not required to have taken pre-requisite courses, but they are advised to check that they have studied the material covered in the syllabus of each pre-requisite course before enrolling. |
Course Delivery Information
|
Academic year 2024/25, Not available to visiting students (SS1)
|
Quota: None |
Course Start |
Semester 1 |
Timetable |
Timetable |
Learning and Teaching activities (Further Info) |
Total Hours:
200
(
Lecture Hours 22,
Supervised Practical/Workshop/Studio Hours 33,
Summative Assessment Hours 60,
Programme Level Learning and Teaching Hours 4,
Directed Learning and Independent Learning Hours
81 )
|
Assessment (Further Info) |
Written Exam
0 %,
Coursework
100 %,
Practical Exam
0 %
|
Additional Information (Assessment) |
Coursework : 100%«br /»
Examination : 0% |
Feedback |
Not entered |
No Exam Information |
Learning Outcomes
On completion of this course, the student will be able to:
- Write reasonably efficient, well structured and documented computer programs in R.
- Write efficient implementations of statistical methods.
- Be able to process data effectively, in particular preparing data for analysis and visualizing data.
- Show appreciation of reliable and reproducible computational methods, and the nature of a statistical analysis.
- Demonstrate expertise in commonly used statistical computing methods.
|
Reading List
Advanced R. Hadley Wickam
Core Statistics. Simon Wood |
Additional Information
Graduate Attributes and Skills |
Not entered |
Keywords | Extended,Statistics,programming,data science |
Contacts
Course organiser | Prof Simon Wood
Tel:
Email: simon.wood@ed.ac.uk |
Course secretary | Miss Kirstie Paterson
Tel:
Email: Kirstie.Paterson@ed.ac.uk |
|
|