Postgraduate Course: Advanced Topics in Foundations of Databases (INFR11122)
|School||School of Informatics
||College||College of Science and Engineering
|Credit level (Normal year taken)||SCQF Level 11 (Postgraduate)
||Availability||Available to all students
|Summary||The course focuses on three central aspects of big data: Volume, Variety and Veracity. It will cover tractability and parallel scalability of querying big data (volume), data models and data interoperability (variety), and foundations of data quality and uncertainty (veracity). It aims to expose students to current research and development in connection with big data theory, and prepare them for conducting research in this emerging area. The course content is dynamic and continuously updated to cover the state-of-the-art in big data theory.
* Background: Fundamental challenges introduced by querying big data; the need for revising the classical computational complexity theory in the context of big data; modelling computational costs and communication costs; BD-tractability: the tractability of queries on big data; the challenges to query data residing in multiple sources; the need to study data quality, the other side of big data.
* Volume: (1) the feasibility of computing exact query answers in big data within our available resources: parallel scalability, scale independence, techniques for making big data small; (2) approximate query answering: (a) query-driven approximation, envelopes with absolute approximation bounds, (b) data-driven approximation, synopsis-based approximate query answering, and (c) resource-bounded approximate query answering and anytime approximation.
* Variety: data can be in different formats, and come from different sources and/or applications. We shall cover: (a) popular data models, including relational, XML, and graph models, and languages for them, and (b) handling queries over data residing in multiple sources, focusing on both virtual and materialized integration, and efficient query answering.
* Veracity: big data = data quantity + data quality; (1) central issues of data quality: data consistency, data accuracy, information completeness, data currency (timeliness), entity resolution; (2) improving data quality: consistency query answering, data repairing, certain fixes; (3) knowledge bases as master data, deducing the true values of entities; (4) handling poor quality information, understanding current technologies and their deficiencies, correctness guarantees.
Big data is the next frontier for innovation, competition and productivity. This course will cover fundamental issues in connection with three of four big V's in the typical characterization of big data, namely, Volume, Variety and Veracity.
Entry Requirements (not applicable to Visiting Students)
|Prohibited Combinations|| Students MUST NOT also be taking
Topics in Distributed Databases (INFR11025) OR
Data Integration and Exchange (Level 11) (INFR11058)
||Other requirements|| The course assumes a strong computer science background, in particular algorithm design and the ability to prove intractability. An emphasis on data management is welcome, such as
relational databases and query languages.
Information for Visiting Students
|High Demand Course?
Course Delivery Information
|Academic year 2015/16, Available to all students (SV1)
|Learning and Teaching activities (Further Info)
Lecture Hours 14,
Seminar/Tutorial Hours 6,
Programme Level Learning and Teaching Hours 4,
Directed Learning and Independent Learning Hours
|Assessment (Further Info)
|Additional Information (Assessment)
||For proper evaluation, students must be presented with real problems, rather than 'toy' ones which can be solved in a very limited time. There will be a list of projects given out to the students at the beginning of the semester from which the students will be able to pick one. The project is research-oriented, to solve a simple research problem by developing algorithms, proofs and analyses. Each student is expected to present their report in class.
The students will deliver their work in four instalments:
- an essay on the volume of big data at the end of Week 3 (worth 15%);
- an essay on the variety of big data at the end of Week 6 (worth 15%);
- an essay on the veracity of big data at the end of Week 9 (worth 15%);
- a final project report (40%)
- a presentation of their work at the end of the semester (worth 15%).
Students are expected to spend around 90 hours working independently their assessed coursework, outside of lectures and direct supervision hours. This includes writing reports during semester and preparing an oral presentation of their project work to the class.
* Three essays: 24 hours in total, 8 hours each;
* Project: 60 hours;
* Oral presentation preparation: 6 hours
|No Exam Information
On completion of this course, the student will be able to:
- Demonstrate an understanding of theory and techniques for querying big data (volume), including BD-tractability, parallel scalability, scale-independent queries, query-driven approximation and data-driven approximation.
- Demonstrate knowledge of coping with the variety of big data, including popular data models and languages for them, and techniques for answering queries in big data residing in multiple sources, focusing on both virtual and materialized integration.
- Demonstrate an understanding of techniques for improving the quality of big data (veracity): data consistency, data accuracy, data currency, information completeness, and entity resolution; data quality rule discovery, error detection, data repairing, consistent query answering, certain fixes and conflict resolution.
- Complete a project for solving simple research problems, by providing proofs, algorithms and analyse.
- Write a project report and present the project in class.
|* Marcelo Arenas, Pablo Barcelo, Leonid Libkin, Filip Murlak: Foundations of Data Exchange. Cambridge University Press 2014 (a shorter Morgan & Claypool version from 2010 is available for free with institutional subscription);|
* Wenfei Fan, Floris Geerts: Foundations of Data Quality Management. Morgan & Claypool Publishers 2012 (available for free with
|Graduate Attributes and Skills
|Keywords||Database systems,data management,big data,scalability,data exchange and integration,data qualit
|Course organiser||Prof Leonid Libkin
Tel: (0131 6)51 3816
|Course secretary||Miss Maree Matheson
Tel: (0131 6)50 9989
© Copyright 2015 The University of Edinburgh - 18 January 2016 4:13 am