SELECT AVG(gpa) FROM enroll GROUP BY college, gender
, BlinkDB will sample enroll
stratified on college, gender
and use the sampled data to evaluate queries really fast.WHERE City = 'New York'
. Here, we can use materialized views.City
, but we don’t know what value is being filtered. This is the regime in which BlinkDB operates.SELECT ... ERROR WITHIN 10% AT CONFIDENCE 95%
or SELECT ... WITHIN 5 SECONDS
.SELECT a, b, c, SUM(D) FROM R GROUP BY a, b, c;
. For each unique value of (a, b, c)
(e.g. (1, 1, 1)
), we create a simple random sample.(a, b, c)
? Given a maximum number of allowable rows $n$, we make each stratified simple random sample as big as possible so that the sum over all samples is less than or equal to $n$.Given a query $j$, say the coverage $y_j$ is the maximum overlap of a sampled QCS that can be used to answer the query. For example consider a query with $q_j$ = (A, C). We look at all the QCS that can be used to answer or partially answer the query:
ABCD
/ \
ABC ACD
\ /
AC
/ \
A C
\ /
{}
If any of ABCD
, ABC
, or ACD
are sampled, then we can answer queries for AC
exactly. If A
or C
is sampled, we can use it to partially answer AC
(aka answer AC
with bad sampling).