PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs (2012)

Graph processing frameworks like Pregel and GraphLab work well for tame graphs, but they don't work well for the kinds of graphs that commonly appear in the wild. These graphs, dubbed natural graphs, have a power law degree distribution in which a small number of vertices are incident to a large fraction of the edges. This paper presents a new graph processing abstraction (and implementation of the abstraction) called PowerGraph which works well on tame and natural graphs.

Graph-Parallel Abstractions

In graph processing frameworks like Pregel and GraphLab, a graph is distributed across a cluster. To express graph computation, users write vertex programs which are executed in parallel on each of the vertices in the graph. Here, we review Pregel and GraphLab.

Pregel executes in a bulk synchronous parallel fashion in a series of super-steps. In each super step, a vertex program combines messages sent by other vertices using an associative and commutative combiner. The vertex program then updates the vertex's state and sends messages to other vertices. Computation ends when all vertices vote to terminate.

GraphLab executes asynchronously with a shared state abstraction. Data is stored on both vertices (which is accessible to all neighbors of the vertex) and on edges (which is accessible only to the two incident vertices). Vertex programs directly read and write the data stored on neighboring vertices and edges. While vertex programs do not run synchronously, GraphLab ensures the execution is serializable.

Both of these graph frameworks can be abstracted using a single gather-apply-scatter (GAS) framework in which computation is expressed using a gather ($g$), apply ($a$), and scatter ($s$) function. Data stored on vertex $u$ is denoted $D_u$, and data stored on edge $(u, v)$ is denoted $D_{(u, v)}$. For a vertex $u$, accumulators generated by the gather function are summed together. Then, apply generates a new state for the vertex. Finally, edges are updated using scatter.

$\Sigma \gets \bigoplus_{v \in Nbr(u)} g(D_u, D_{(u,v)}, D_v)$
$D_u^{new} \gets a(D_u, \Sigma)$
$D_{(u,v)}^{new} \gets s(D_u^{new}, D_{(u,v)}, D_v)$

Challenges of Natural Graphs

Pregel and GraphLab do not work well on natural graphs because in these frameworks, the storage, communication, and computation overheads of a vertex are proportional to its degree. The frameworks are able to parallelize different vertices across multiple machines but are unable to parallelize a single vertex program across multiple machines.

PowerGraph Abstraction

The PowerGraph abstraction is essentially the GAS abstraction. Data is stored on vertices ($D_u$) and edges ($D_{(u,v)}$). PowerGraph vertex programs implement the GASVertexProgram interface with the following functions:

gather($D_u$,$D_{(u,v)}$,$D_v$) $\to$ Accum
sum(Accum, Accum) $\to$ Accum
apply($D_u$, Accum) $\to$ $D_u^{new}$
scatter($D_u^{new}$,$D_{(u,v)}$,$D_v$) $\to$ ($D_{(u,v)}^{new}$,Accum)

For a vertex $u$, gather collects data from $u$'s neighbors (specified as none, all, in, or out). The intermediate accumulators generated by gather are accumulated by sum. apply generates a new state. Scatter writes data to $u$'s neighboring edges, can return an optional delta accumulator (more on this later), and can activate other vertexes to run (more on this later too). See the paper for three example PowerGraph programs. To work well on natural graphs, the size of accumulators and the runtime of apply should be constant in the degree of the node.

Delta Caching

When a neighbor $v$ to vertex $u$ updates its state and activates the execution of $u$, most of $u$'s neighbors have not changed, so running gather on all of them is largely a waste. When Accum forms an Abelian group under sum, scatter can return a delta $\Delta a_v = g(D_u, D_{(u,v)}^{new}, D_v^{new})$ and $a_u$ is updated to be $a_u + \Delta a_v$. This updated $a_u$ is equivalent to re-running $g$ on all of $u$'s neighbors. When scatter does not return a delta, $u$ is re-run in entirety.

Initiating Future Computation

Vertex programs can activate neighboring vertex programs to execute. The order in which vertices execute is controlled by the system. PowerGraph supports two modes of execution: bulk synchronous and asynchronous. In bulk synchronous mode, gather, sum, apply, and scatter are run in lock step across all vertices. In asynchronous mode, vertices are run whenever they are activated, and like GraphLab, PowerGraph ensures serializable execution.

Comparison with GraphLab and Pregel

PowerGraph is at least as expressive as GraphLab and Pregel. We can translate a GraphLab and Pregel program to an equivalent PowerGraph program. For GraphLab, the gather and sum functions concatenate all the data on neighboring vertices and edges, and apply executes the GraphLab program. For Pregel, gather and sum collect messages (written on edges) and a list of neighbors. apply computes the messages to be sent to neighbors. scatter writes the messages to the appropriate edge.

Distributed Graph Placement

In order to execute graph algorithms in parallel, we have to distribute graphs across a cluster of machines. Pregel and GraphLab do so with edge cuts. They divide the vertices of the graph (roughly) evenly across all machines and try to minimize the number of edges which traverse machines. The more cut edges, the more communication and storage overhead. Both platforms randomly sometimes end up assigning vertices randomly across $p$ machines in which case $1 - \frac{1}{p}$ (aka almost all) edges are cut.

PowerGraph executes vertex programs on multiple machines by performing a vertex cut. Here, edges are assigned (roughly) evenly to all machines and the number of split vertices is minimized. One vertex replica is designated the master and the rest are mirrors. Gather is executed in parallel on all replicas, the intermediate results are then sent to the master, the master performs apply and sends the result back to all mirrors, and then scatter is run in parallel. Natural graphs can be vertex cut very nicely, and randomly assigning edges works well.

The following greedy vertex cut algorithm also produces nice partitions. Let $A(u)$ denote the set of machines over which $u$ is replicated. For every edge $(u, v)$:

If $A(u)$ and $A(v)$ intersect, assign the edge to a machine in the intersection.
If either $A(u)$ or $A(v)$ is non-empty and do not intersect, assign the edge to a machine in $A(u)$ or $A(v)$ based on whichever vertex has the most unassigned edges.
If both $A(u)$ and $A(v)$ are empty, assign the edge to the least loaded machine.