Postgraduate Course: Foundations of Databases (INFR11200)
Course Outline
School  School of Informatics 
College  College of Science and Engineering 
Credit level (Normal year taken)  SCQF Level 11 (Postgraduate) 
Availability  Available to all students 
SCQF Credits  10 
ECTS Credits  5 
Summary  This course is a revised and simplified version of the course Advanced Topics in Foundations of Databases (INFR11122) which was offered until 201920.
Data is everywhere, coming in different shapes and volumes, and needs to be stored and managed using appropriate data management technologies. The basic software package that supports the management of data is called a database management system (DBMS). The main goal of this course is to explain some of the underlying theoretical principles and characteristics of DBMSs. More precisely, this course will explain how reallife concepts (such as a database and a query) and phenomena (such as incompleteness and inconsistency of data), can be abstracted from their physical implementation and mathematically formalised using tools coming from other areas such as computational logic. This will pave the way towards the study of query evaluation, that is, the central task of extracting meaningful information from (possibly incomplete and inconsistent) data by means of queries, following a mathematically rigorous approach. This analysis will expose the source of complexity in evaluating a query over a database, which in turn provides ideas and tools on how to devise more efficient query evaluation algorithms. 
Course description 
The course will cover the following topics:
 Relational model: data model, relational algebra, relational calculus (firstorder queries), firstorder query evaluation, static analysis of firstorder queries (satisfiability and containment).
 Conjunctive queries (CQs): syntax and semantics (via the notion of homomorphism), CQ evaluation, static analysis of CQs (satisfiability, containment and the Homomorphism Theorem), minimisation of CQs.
 Fast conjunctive query evaluation: acyclic CQs, evaluating acyclic CQs (Yannakaki's algorithm), semantically acyclic CQs and their evaluation, size bounds for joins (AGM bound), worstcase optimal join algorithms.
 Adding recursion  Datalog: inexpressibility of recursive queries, syntax and semantics of Datalog, Datalog query evaluation, static analysis of Datalog queries, Datalog vs. firstorder queries.
 Uncertainty  reasoning over possible worlds: incomplete databases, inconsistent databases, probabilistic databases, knowledgeenriched databases.
Student will be assessed 100% by incourse assessment:
 Essay 1 (formative): students will choose a research paper (from a given list) on the relational model, and present (i) a summary of the paper, and (ii) analysis and critical thoughts (criticism of the paper, and discussion on followup papers that show how the ideas of the paper under review have influenced the field). The length of the essay must be 5 to 7 pages (including references). Each essay should be clearly written in sentences with appropriate punctuation, display of formulae, appropriate use of "Definition", "Lemma", "Theorem", "Proof", etc. The work should be properly and adequately referenced in the text, with the full list of references at the end of the essay, following any of the standard labelling conventions as technical papers (e.g., numerical, or by abbreviated name). The essay will be marked on its clarity and technical accuracy, which are equally important criteria. It must be understood by someone who has not read the paper. Of course, in reality, it will be marked by someone who has read the paper, but still this is an important criterion that will be used in marking.
 Essay 2 (50%): as for Essay 1, but with the difference that students will choose a research paper on a topic covered during the lectures (not only on the relational model as for Essay 1). The essay will be marked on its clarity (25%) and technical accuracy (25%).
 Takehome test (50%): students will complete a takehome test at the end of the course, which will consist of three problems on relational queries, Datalog queries, and uncertain data. The problems will be of variable difficulty (easy  10%, medium  15%, hard  25%). For the hard problem, a hint that will guide the student towards the final solution will be given. The completion of the takehome assignment will take no more than 25 hours.

Entry Requirements (not applicable to Visiting Students)
Prerequisites 
It is RECOMMENDED that students have passed
Database Systems (INFR10070) OR
Introduction to Databases (INFR10080)

Corequisites  
Prohibited Combinations  
Other requirements  While there are no formal prerequisites, it is recommended that students have passed an introductory course in
Databases such as the undergraduate course Database Systems (INFR10070) or Introduction to Databases (INFR10080); in particular, some familiarity with the relational model, the main relational query languages (calculus and algebra), and integrity constraints is welcome. It is also recommended that students have some basic familiarity with complexity theory (standard complexity classes such as PTIME and NP, and the notion of completeness). In any case, this course is selfcontained, and all the necessary tools will be properly introduced and explained during the lectures. 
Information for Visiting Students
Prerequisites  Same as other requirements 
High Demand Course? 
Yes 
Course Delivery Information
Not being delivered 
Learning Outcomes
On completion of this course, the student will be able to:
 Abstract relational data and relational queries from their physical implementation, and formalise them in a rigorous way.
 Analyse the complexity of querying relational data, and isolate the source of complexity.
 Explain the semantics of Datalog queries, analyse the complexity of evaluating Datalog queries, and model reallife queries in a declarative way.
 Formalise uncertain data, analyse the complexity of querying uncertain data, and explain the reasons that lead to intractability.
 Read and summarise research papers.

Reading List
 Abiteboul, Hull, Vianu, Foundations of Databases, 1995
 Libkin, Elements of Finite Model Theory, 2012
 Bertossi, Database Repairing and Consistent Query answering, 2011
 Suciu, Olteanu, Re, Koch, Probabilistic Databases, 2011 
Additional Information
Graduate Attributes and Skills 
Problemsolving, analytical thinking, independent learning, written communication. 
Special Arrangements 
While there are no formal prerequisites, it is recommended that students have passed an introductory course in Databases such as the undergraduate course Introduction to Databases; in particular, some familiarity with the relational model, the main relational query languages (calculus and algebra), and integrity constraints is welcome. It is also recommended that students have some basic familiarity with complexity theory (standard complexity classes such as PTIME and NP, and the notion of completeness). In any case, this course is selfcontained, and all the necessary tools will be properly introduced and explained during the lectures. 
Keywords  Relational data,Relational queries,Query evaluation,Static analysis of queries,Fast query evaluation 
Contacts
Course organiser  Dr Andreas Pieris
Tel: (0131 6)51 5606
Email: apieris@inf.ed.ac.uk 
Course secretary  Ms Lindsay Seal
Tel: (0131 6)50 2701
Email: lindsay.seal@ed.ac.uk 

