Certain Answers Meet Zero-One Laws
The talk will start with a brief overview of querying incomplete information in databases and its main computational challenges. I will present a summary of the work that came out of the Edinburgh group in the past several years, before concentrating on the core of the talk, reflected in the title. Querying incomplete data invariably relies on the very coarse classification of query answers into those that are certain and those that are not. Such a classification is often very costly, and we refine it by measuring how close an answer is to certainty.
This measure is defined as the probability that the query is true under a random interpretation of missing information in a database. Since there are infinitely many such interpretations, to pick one at random we adopt the approach used in the study of asymptotic properties and 0-1 laws for logical sentences and define the measure as the limit of a sequence. We show that in the standard model of missing data, the 0-1 law is observed: this limit always exists and can be only 0 or 1 for a very large class of queries. Thus, query answers are either almost certainly true, or almost certainly false, and this classification behaves very well computationally. When databases satisfy constraints, the measure is defined as the conditional probability of the query being true if the constraints are true. This can now be an arbitrary rational number, which is always computable. Another refinement of the notion of certainty views answers with a larger set of interpretations that make them true as better ones. We pinpoint the exact complexity of finding best answers for first-order queries.