Postgraduate Course: Advanced Topics in Foundations of Databases (INFR11122)
|School||School of Informatics
||College||College of Science and Engineering
|Credit level (Normal year taken)||SCQF Level 11 (Postgraduate)
||Availability||Available to all students
|Summary||The course focuses on three central aspects of big data: Volume, Variety and Veracity. It will cover tractability and parallel scalability of querying big data (volume), data models and data interoperability (variety), and foundations of data quality and uncertainty (veracity). It aims to expose students to current research and development in connection with big data theory, and prepare them for conducting research in this emerging area. The course content is dynamic and continuously updated to cover the state-of-the-art in big data theory.
* Please note that this course has been replaced by a 10-credit course "Foundations of Databases" (INFR11200) from 2020/21.*
* Background: Fundamental challenges introduced by querying big data; the need for revising the classical computational complexity theory in the context of big data; modelling computational costs and communication costs; BD-tractability: the tractability of queries on big data; the challenges to query data residing in multiple sources; the need to study data quality, the other side of big data.
* Volume: (1) the feasibility of computing exact query answers in big data within our available resources: parallel scalability, scale independence, techniques for making big data small; (2) approximate query answering: (a) query-driven approximation, envelopes with absolute approximation bounds, (b) data-driven approximation, synopsis-based approximate query answering, and (c) resource-bounded approximate query answering and anytime approximation.
* Variety: data can be in different formats, and come from different sources and/or applications. We shall cover: (a) popular data models, including relational, XML, and graph models, and languages for them, and (b) handling queries over data residing in multiple sources, focusing on both virtual and materialized integration, and efficient query answering.
* Veracity: big data = data quantity + data quality; (1) central issues of data quality: data consistency, data accuracy, information completeness, data currency (timeliness), entity resolution; (2) improving data quality: consistency query answering, data repairing, certain fixes; (3) knowledge bases as master data, deducing the true values of entities; (4) handling poor quality information, understanding current technologies and their deficiencies, correctness guarantees.
Big data is the next frontier for innovation, competition and productivity. This course will cover fundamental issues in connection with three of four big V's in the typical characterization of big data, namely, Volume, Variety and Veracity.
Entry Requirements (not applicable to Visiting Students)
||Other requirements|| The course assumes a strong computer science background, in particular algorithm design and the ability to prove intractability. An emphasis on data management is welcome, such as relational databases and query languages.
Information for Visiting Students
|High Demand Course?
Course Delivery Information
|Not being delivered|
On completion of this course, the student will be able to:
- Demonstrate an understanding of theory and techniques for querying big data (volume), including BD-tractability, parallel scalability, scale-independent queries, query-driven approximation and data-driven approximation.
- Demonstrate knowledge of coping with the variety of big data, including popular data models and languages for them, and techniques for answering queries in big data residing in multiple sources, focusing on both virtual and materialized integration.
- Demonstrate an understanding of techniques for improving the quality of big data (veracity): data consistency, data accuracy, data currency, information completeness, and entity resolution; data quality rule discovery, error detection, data repairing, consistent query answering, certain fixes and conflict resolution.
- Complete a project for solving simple research problems, by providing proofs, algorithms and analyse.
- Write a project report and present the project in class.
|* Marcelo Arenas, Pablo Barcelo, Leonid Libkin, Filip Murlak: Foundations of Data Exchange. Cambridge University Press 2014 (a shorter Morgan & Claypool version from 2010 is available for free with institutional subscription);|
* Wenfei Fan, Floris Geerts: Foundations of Data Quality Management. Morgan & Claypool Publishers 2012 (available for free with
|Graduate Attributes and Skills
|Keywords||Database systems,data management,big data,scalability,data exchange and integration,data qualit
|Course organiser||Dr Andreas Pieris
Tel: (0131 6)51 5606
|Course secretary||Ms Lindsay Seal
Tel: (0131 6)50 2701