THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2020/2021

Information in the Degree Programme Tables may still be subject to change in response to Covid-19

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Informatics : Informatics

Postgraduate Course: Advanced Topics in Foundations of Databases (INFR11122)

Course Outline
SchoolSchool of Informatics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 11 (Postgraduate) AvailabilityAvailable to all students
SCQF Credits20 ECTS Credits10
SummaryThe course focuses on three central aspects of big data: Volume, Variety and Veracity. It will cover tractability and parallel scalability of querying big data (volume), data models and data interoperability (variety), and foundations of data quality and uncertainty (veracity). It aims to expose students to current research and development in connection with big data theory, and prepare them for conducting research in this emerging area. The course content is dynamic and continuously updated to cover the state-of-the-art in big data theory.

* Please note that this course has been replaced by a 10-credit course "Foundations of Databases" (INFR11200) from 2020/21.*
Course description * Background: Fundamental challenges introduced by querying big data; the need for revising the classical computational complexity theory in the context of big data; modelling computational costs and communication costs; BD-tractability: the tractability of queries on big data; the challenges to query data residing in multiple sources; the need to study data quality, the other side of big data.

* Volume: (1) the feasibility of computing exact query answers in big data within our available resources: parallel scalability, scale independence, techniques for making big data small; (2) approximate query answering: (a) query-driven approximation, envelopes with absolute approximation bounds, (b) data-driven approximation, synopsis-based approximate query answering, and (c) resource-bounded approximate query answering and anytime approximation.

* Variety: data can be in different formats, and come from different sources and/or applications. We shall cover: (a) popular data models, including relational, XML, and graph models, and languages for them, and (b) handling queries over data residing in multiple sources, focusing on both virtual and materialized integration, and efficient query answering.

* Veracity: big data = data quantity + data quality; (1) central issues of data quality: data consistency, data accuracy, information completeness, data currency (timeliness), entity resolution; (2) improving data quality: consistency query answering, data repairing, certain fixes; (3) knowledge bases as master data, deducing the true values of entities; (4) handling poor quality information, understanding current technologies and their deficiencies, correctness guarantees.


Big data is the next frontier for innovation, competition and productivity. This course will cover fundamental issues in connection with three of four big V's in the typical characterization of big data, namely, Volume, Variety and Veracity.
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Co-requisites
Prohibited Combinations Other requirements The course assumes a strong computer science background, in particular algorithm design and the ability to prove intractability. An emphasis on data management is welcome, such as relational databases and query languages.
Information for Visiting Students
Pre-requisitesNone
High Demand Course? Yes
Course Delivery Information
Not being delivered
Learning Outcomes
On completion of this course, the student will be able to:
  1. Demonstrate an understanding of theory and techniques for querying big data (volume), including BD-tractability, parallel scalability, scale-independent queries, query-driven approximation and data-driven approximation.
  2. Demonstrate knowledge of coping with the variety of big data, including popular data models and languages for them, and techniques for answering queries in big data residing in multiple sources, focusing on both virtual and materialized integration.
  3. Demonstrate an understanding of techniques for improving the quality of big data (veracity): data consistency, data accuracy, data currency, information completeness, and entity resolution; data quality rule discovery, error detection, data repairing, consistent query answering, certain fixes and conflict resolution.
  4. Complete a project for solving simple research problems, by providing proofs, algorithms and analyse.
  5. Write a project report and present the project in class.
Reading List
* Marcelo Arenas, Pablo Barcelo, Leonid Libkin, Filip Murlak: Foundations of Data Exchange. Cambridge University Press 2014 (a shorter Morgan & Claypool version from 2010 is available for free with institutional subscription);
* Wenfei Fan, Floris Geerts: Foundations of Data Quality Management. Morgan & Claypool Publishers 2012 (available for free with
institutional subscription)
Additional Information
Course URL http://course.inf.ed.ac.uk/atfd/
Graduate Attributes and Skills Not entered
KeywordsDatabase systems,data management,big data,scalability,data exchange and integration,data qualit
Contacts
Course organiserDr Andreas Pieris
Tel: (0131 6)51 5606
Email: apieris@inf.ed.ac.uk
Course secretaryMs Lindsay Seal
Tel: (0131 6)50 2701
Email: lindsay.seal@ed.ac.uk
Navigation
Help & Information
Home
Introduction
Glossary
Search DPTs and Courses
Regulations
Regulations
Degree Programmes
Introduction
Browse DPTs
Courses
Introduction
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Prospectuses
Important Information