$\newcommand{\fire}{\textsf{fire}}$ $\newcommand{\next}{\textsf{next}}$ $\newcommand{\rin}{\textsf{in}}$ $\newcommand{\rout}{\textsf{out}}$
Debugging datalog programs is hard. Given a buggy model $M = P(I)$ of a datalog program $P$ run on database instance $I$, we might want to know why certain tuples in $M$ appear that shouldn't. This paper presents a technique to rewrite $P$ to a modified program $\hat{P}$ which computes a modified model $\hat{M}$ which includes the provenance of every tuple in $M$.
The modified model $\hat{M}$ represents a provenance graph $(V, E)$ where a vertex is a rule instance $r$ or an atom $a$. An edge $r \to a$ indicates that rule instance $r$ derived atom $a$ (i.e. $a$ is the head of $r$), and an edge $a \to r$ indicates that $a$ appears in the body of $r$. Well, this isn't quite true. The graph also includes the iterations in which a tuple is derived using a naive evaluation strategy.
The rewrite $P \to \hat{P}$ proceeds in three steps: rule firings, graph reification, and statelog rewriting. The first two construct the provenance graph; the third adds the iteration information.
First, we must capture every rule firing. We rewrite a rule of the form:
\(
\begin{array}{lrcl}
r: & H(\bar{Y}) &\coleq& B_1(\bar{X_1}), \ldots, B_n(\bar{X_n})
\end{array}
\)
into two rules:
\(
\begin{array}{lrcl}
r_{in}: & \fire_r(\bar{X}) &\coleq& B_1(\bar{X_1}), \ldots, B_n(\bar{X_n}) \\
r_{out}: & H(\bar{Y}) &\coleq& \fire_r(\bar{X})
\end{array}
\)
where $\bar{X} \defeq \cup_i \bar{X_i}$.
Next, we use Skolemization to convert $r_{in}$ and $r_{out}$ into concrete nodes in our provenance graph. For each pair $r_{in}$ and $r_{out}$, we generate the following $n$ rules for edges into $\fire_r(\bar{X})$.
\(
\begin{array}{rcl}
g(B_i(\bar{X_i}), \rin, \fire_r(\bar{X})) &\coleq& \fire_r(\bar{X})
\end{array}
\)
We also generate an edge from $\fire_r(\bar{X})$ to $H(\bar{Y})$:
\(
\begin{array}{rcl}
g(\fire_r(\bar{X}), \rout, H(\bar{Y})) &\coleq& \fire_r(\bar{X})
\end{array}
\)
Finally, we track the iterations at which a rule is derived under a naive evaluation scheme. We replace $r_{in}$ and $r_{out}$.
\(
\begin{array}{lrcl}
r_{in}: & \fire_r(S1, \bar{X}) &\coleq&
B_1(S, \bar{X_1}), \ldots, B_n(S, \bar{X_n}), \next(S, S1) \\
r_{out}: & H(S, \bar{Y}) &\coleq& \fire_r(S, \bar{X})
\end{array}
\)
Every iteration S
has a successor so long as tuples are produced:
\(
\begin{array}{rcl}
\next(0, 1) &\coleq& \textsf{true} \\
\next(S, S1) &\coleq& \next(\_, S), \textsf{newAtom}(S, A), S1=S+1 \\
\textsf{newAtom}(S1, A) &\coleq&
\next(S, S1), g(S1, \_, \rout, A), \lnot g(S, \_, \rout, A)
\end{array}
\)
Given a provenance graph $\hat{M} = \hat{P}(I)$, we can perform the following debugging and profiling tasks.