Evaluating OLAP-style Queries
- Imagine we want to evaluate a join like
SELECT A.aa, B.bb, C.cc, SUM(F.f)
FROM F, A, B, C
WHERE F.a = A.a AND F.b = B.b AND F.c = C.c
A.x = 1 AND B.x = 2 AND C.x = 3
GROUP BY A.aa, B.bb, C.cc
- First, we can create a projection join index on
F
that simply stores the values of A.x
, B.x
, and C.x
. For example, if F is the following:
F(a, b, c)
+---+---+---+
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 1 | 2 | 1 |
| 1 | 2 | 2 |
| 2 | 1 | 1 |
+---+---+---+
- We can create the following indexes:
A.aa B.bb C.cc
+---+ +---+ +---+
| 6 | | 8 | | 4 |
| 6 | | 8 | | 3 |
| 6 | | 9 | | 4 |
| 6 | | 9 | | 3 |
| 7 | | 8 | | 4 |
+---+ +---+ +---+
A.x B.x C.x
+---+ +---+ +---+
| 6 | | 8 | | 4 |
| 6 | | 8 | | 3 |
| 6 | | 9 | | 4 |
| 6 | | 9 | | 3 |
| 7 | | 8 | | 4 |
+---+ +---+ +---+
- Now, we can compute the filter using these join indexes.
- To compute the group by using projection indexes over aa, bb, cc, and f, we simply scan through them and compute the groups in memory or do a sorting or hashing based group by.
- If instead we have a B+ tree over aa, bb, and cc, we can perform a nested for loop over the indexes computing the intersections of the corresponding bitmaps.
- If we cluster F according to these groups, then segmentation will help us avoid performing full bitmap intersections.
- We can also create an index mapping a group to its offset in the group-sorted file.