Postgraduate Course: Using R for Data Science (PGBI11122)
Course Outline
School | School of Biological Sciences |
College | College of Science and Engineering |
Credit level (Normal year taken) | SCQF Level 11 (Postgraduate) |
Availability | Not available to visiting students |
SCQF Credits | 10 |
ECTS Credits | 5 |
Summary | R is an environment and a language for data analysis and statistics. R provides a generic set of tools that can be applied to problems in many areas of data science as well as in related areas such as bioinformatics and genomics. This course explores the rich set of tools that R provides and how in practice these tools can be applied to solve real complex biological problems. As part of the course we introduce a biological model problem in detail, currently a problem from regulatory genomics, and show how we can build complex analysis processes using R and apply machine learning techniques to model this data. |
Course description |
The course is taught from first principles and no previous experience of R is required. The course begins with an introduction to R and RStudio and also the biological background to the model question and the specific machine learning methods used on the course. It will go on to explore the R programming model and how in practice scripts and workflows are written in R. The course will then explore how interactive graphical applications can be build using R and Shiny, how complex relational data can be exploited using R. It will explore how data can be imported and processed in R (data cleaning and wrangling) and ultimately how interactive workflows can be run using cluster computing (using Apache Spark). Finally the course will explore detailed data visualisation and plotting of results (eg using ggplot2) and how the analysis outcome can be interpreted in the context of the motivating biological problem.
This course is designed to be complementary to other existing courses that make extensive use of R such as Statistics and Data Analysis (PGBI11003) or Functional Genomic Technologies (PGBI11040). There is a strong focus on developing real practical generic skills in R that can then be applied to a wide range of biological problems.
|
Entry Requirements (not applicable to Visiting Students)
Pre-requisites |
|
Co-requisites | |
Prohibited Combinations | |
Other requirements | None |
Course Delivery Information
|
Academic year 2021/22, Not available to visiting students (SS1)
|
Quota: 60 |
Course Start |
Semester 1 |
Timetable |
Timetable |
Learning and Teaching activities (Further Info) |
Total Hours:
100
(
Lecture Hours 30,
Programme Level Learning and Teaching Hours 2,
Directed Learning and Independent Learning Hours
68 )
|
Assessment (Further Info) |
Written Exam
50 %,
Coursework
50 %,
Practical Exam
0 %
|
Additional Information (Assessment) |
In-course assessment (50%) and exam (50%). The in-course assessment will be a generalised analysis task using R applying methodologies taught in the course. The exam will be made up of three questions: one compulsory and two optional. |
Feedback |
Assignment marks and written feedback will be provided fifteen working days after submission.
Exam marks and written feedback will be provided after mark ratification at the semester 2 Board of Examiners. |
Exam Information |
Exam Diet |
Paper Name |
Hours & Minutes |
|
Main Exam Diet S1 (December) | | 2:00 | |
Learning Outcomes
On completion of this course, the student will be able to:
- Implement in software a complex data analysis task in R.
- Pick the appropriate analysis strategy to achieve address a particular analysis question.
- Interpret a complex analysis output in terms of the experiment hypothesis and specific biological context.
|
Reading List
R for Data Science, Hadley Wickham & Garrett Grolemund
Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Editors: Robert Gentleman, Vincent J. Carey, Wolfgang Huber, Rafael A. Irizarry, Sandrine Dudoit |
Additional Information
Graduate Attributes and Skills |
SCQF Level 11, Characteristic 2-Practice, Applied knowledge, skills and understanding. For example, Develop original and creative responses to problems and issues.
SCQF Level 11, Characteristic 3-Generic cognitive skills. For example Knowledge that covers and integrates most, if not all, of the main areas of the subject/discipline/sector including their features, boundaries, terminology and conventions.
SCQF Level 11, Characteristic 4-Communication, Numeracy and ICT skills. For example, use a wide range of ICT applications to support and enhance work at this level and adjust features to suit purpose.
SCQF Level 11, Characteristic 5- Autonomy, Accountability and Working with Others. For example, exercise substantial initiative in professional and equivalent activities and take responsibility for own work. |
Keywords | Bioinformatics,R,Data Science |
Contacts
Course organiser | Dr Simon Tomlinson
Tel: (0131 6)51 7252
Email: simon.tomlinson@ed.ac.uk |
Course secretary | Ms Louise Robertson
Tel: (0131 6)50 5988
Email: Louise.K.M.Robertson@ed.ac.uk |
|
|