Postgraduate Course: Foundations of Databases (INFR11200)
|School||School of Informatics
||College||College of Science and Engineering
|Credit level (Normal year taken)||SCQF Level 11 (Postgraduate)
||Availability||Available to all students
|Summary||This course is a revised and simplified version of the course Advanced Topics in Foundations of Databases (INFR11122) which was offered until 2019-20.
Data is everywhere, coming in different shapes and volumes, and needs to be stored and managed using appropriate data management technologies. The basic software package that supports the management of data is called a database management system (DBMS). The main goal of this course is to explain some of the underlying theoretical principles and characteristics of DBMSs. More precisely, this course will explain how real-life concepts (such as a database and a query) and phenomena (such as incompleteness and inconsistency of data), can be abstracted from their physical implementation and mathematically formalised using tools coming from other areas such as computational logic. This will pave the way towards the study of query evaluation, that is, the central task of extracting meaningful information from (possibly incomplete and inconsistent) data by means of queries, following a mathematically rigorous approach. This analysis will expose the source of complexity in evaluating a query over a database, which in turn provides ideas and tools on how to devise more efficient query evaluation algorithms.
The course will cover the following topics:
- Relational model: data model, relational algebra, relational calculus (first-order queries), first-order query evaluation, static analysis of first-order queries (satisfiability and containment).
- Conjunctive queries (CQs): syntax and semantics (via the notion of homomorphism), CQ evaluation, static analysis of CQs (satisfiability, containment and the Homomorphism Theorem), minimisation of CQs.
- Fast conjunctive query evaluation: acyclic CQs, evaluating acyclic CQs (Yannakaki's algorithm), semantically acyclic CQs and their evaluation, size bounds for joins (AGM bound), worst-case optimal join algorithms.
- Adding recursion - Datalog: inexpressibility of recursive queries, syntax and semantics of Datalog, Datalog query evaluation, static analysis of Datalog queries, Datalog vs. first-order queries.
- Uncertainty - reasoning over possible worlds: incomplete databases, inconsistent databases, probabilistic databases, knowledge-enriched databases.
Student will be assessed 100% by in-course assessment:
- Essay 1 (formative): students will choose a research paper (from a given list) on the relational model, and present (i) a summary of the paper, and (ii) analysis and critical thoughts (criticism of the paper, and discussion on follow-up papers that show how the ideas of the paper under review have influenced the field). The length of the essay must be 5 to 7 pages (including references). Each essay should be clearly written in sentences with appropriate punctuation, display of formulae, appropriate use of "Definition", "Lemma", "Theorem", "Proof", etc. The work should be properly and adequately referenced in the text, with the full list of references at the end of the essay, following any of the standard labelling conventions as technical papers (e.g., numerical, or by abbreviated name). The essay will be marked on its clarity and technical accuracy, which are equally important criteria. It must be understood by someone who has not read the paper. Of course, in reality, it will be marked by someone who has read the paper, but still this is an important criterion that will be used in marking.
- Essay 2 (50%): as for Essay 1, but with the difference that students will choose a research paper on a topic covered during the lectures (not only on the relational model as for Essay 1). The essay will be marked on its clarity (25%) and technical accuracy (25%).
- Take-home test (50%): students will complete a take-home test at the end of the course, which will consist of three problems on relational queries, Datalog queries, and uncertain data. The problems will be of variable difficulty (easy - 10%, medium - 15%, hard - 25%). For the hard problem, a hint that will guide the student towards the final solution will be given. The completion of the take-home assignment will take no more than 25 hours.
Entry Requirements (not applicable to Visiting Students)
|| It is RECOMMENDED that students have passed
Database Systems (INFR10070) OR
Introduction to Databases (INFR10080)
||Other requirements|| While there are no formal prerequisites, it is recommended that students have passed an introductory course in
Databases such as the undergraduate course Database Systems (INFR10070) or Introduction to Databases (INFR10080); in particular, some familiarity with the relational model, the main relational query languages (calculus and algebra), and integrity constraints is welcome. It is also recommended that students have some basic familiarity with complexity theory (standard complexity classes such as PTIME and NP, and the notion of completeness). In any case, this course is self-contained, and all the necessary tools will be properly introduced and explained during the lectures.
Information for Visiting Students
|Pre-requisites||Same as other requirements
|High Demand Course?
Course Delivery Information
|Not being delivered|
On completion of this course, the student will be able to:
- Abstract relational data and relational queries from their physical implementation, and formalise them in a rigorous way.
- Analyse the complexity of querying relational data, and isolate the source of complexity.
- Explain the semantics of Datalog queries, analyse the complexity of evaluating Datalog queries, and model real-life queries in a declarative way.
- Formalise uncertain data, analyse the complexity of querying uncertain data, and explain the reasons that lead to intractability.
- Read and summarise research papers.
|- Abiteboul, Hull, Vianu, Foundations of Databases, 1995|
- Libkin, Elements of Finite Model Theory, 2012
- Bertossi, Database Repairing and Consistent Query answering, 2011
- Suciu, Olteanu, Re, Koch, Probabilistic Databases, 2011
|Graduate Attributes and Skills
||Problem-solving, analytical thinking, independent learning, written communication.
||While there are no formal prerequisites, it is recommended that students have passed an introductory course in Databases such as the undergraduate course Introduction to Databases; in particular, some familiarity with the relational model, the main relational query languages (calculus and algebra), and integrity constraints is welcome. It is also recommended that students have some basic familiarity with complexity theory (standard complexity classes such as PTIME and NP, and the notion of completeness). In any case, this course is self-contained, and all the necessary tools will be properly introduced and explained during the lectures.
|Keywords||Relational data,Relational queries,Query evaluation,Static analysis of queries,Fast query evaluation
|Course organiser||Dr Andreas Pieris
Tel: (0131 6)51 5606
|Course secretary||Ms Lindsay Seal
Tel: (0131 6)50 2701