Declarative Datalog Debugging for Mere Mortals (2012)

Debugging datalog programs is hard. Given a buggy model $M = P(I)$ of a datalog program $P$ run on database instance $I$, we might want to know why certain tuples in $M$ appear that shouldn't. This paper presents a technique to rewrite $P$ to a modified program $\hat{P}$ which computes a modified model $\hat{M}$ which includes the provenance of every tuple in $M$.

Overview

The modified model $\hat{M}$ represents a provenance graph $(V, E)$ where a vertex is a rule instance $r$ or an atom $a$. An edge $r \to a$ indicates that rule instance $r$ derived atom $a$ (i.e. $a$ is the head of $r$), and an edge $a \to r$ indicates that $a$ appears in the body of $r$. Well, this isn't quite true. The graph also includes the iterations in which a tuple is derived using a naive evaluation strategy.

The rewrite $P \to \hat{P}$ proceeds in three steps: rule firings, graph reification, and statelog rewriting. The first two construct the provenance graph; the third adds the iteration information.

Rule Firings

First, we must capture every rule firing. We rewrite a rule of the form:

\(
\begin{array}{lrcl}
  r: & H(\bar{Y}) &\coleq& B_1(\bar{X_1}), \ldots, B_n(\bar{X_n})
\end{array}
\)

into two rules:

\(
\begin{array}{lrcl}
  r_{in}:  & \fire_r(\bar{X}) &\coleq& B_1(\bar{X_1}), \ldots, B_n(\bar{X_n}) \\
  r_{out}: & H(\bar{Y}) &\coleq& \fire_r(\bar{X})
\end{array}
\)

where $\bar{X} \defeq \cup_i \bar{X_i}$.

Graph Reification

Next, we use Skolemization to convert $r_{in}$ and $r_{out}$ into concrete nodes in our provenance graph. For each pair $r_{in}$ and $r_{out}$, we generate the following $n$ rules for edges into $\fire_r(\bar{X})$.

\(
\begin{array}{rcl}
  g(B_i(\bar{X_i}), \rin, \fire_r(\bar{X})) &\coleq& \fire_r(\bar{X})
\end{array}
\)

We also generate an edge from $\fire_r(\bar{X})$ to $H(\bar{Y})$:

\(
\begin{array}{rcl}
  g(\fire_r(\bar{X}), \rout, H(\bar{Y})) &\coleq& \fire_r(\bar{X})
\end{array}
\)

Statelog Rewriting

Finally, we track the iterations at which a rule is derived under a naive evaluation scheme. We replace $r_{in}$ and $r_{out}$.

\(
\begin{array}{lrcl}
  r_{in}: & \fire_r(S1, \bar{X}) &\coleq&
    B_1(S, \bar{X_1}), \ldots, B_n(S, \bar{X_n}), \next(S, S1) \\
  r_{out}: & H(S, \bar{Y}) &\coleq& \fire_r(S, \bar{X})
\end{array}
\)

Every iteration S has a successor so long as tuples are produced:

\(
\begin{array}{rcl}
  \next(0, 1) &\coleq& \textsf{true} \\
  \next(S, S1) &\coleq& \next(\_, S), \textsf{newAtom}(S, A), S1=S+1 \\
  \textsf{newAtom}(S1, A) &\coleq&
    \next(S, S1), g(S1, \_, \rout, A), \lnot g(S, \_, \rout, A)
\end{array}
\)

Debugging and Profiling

Given a provenance graph $\hat{M} = \hat{P}(I)$, we can perform the following debugging and profiling tasks.