SELECT * FROM R WHERE s1() and s2();
where both predicates have equal selectivity but one is much faster than the other. If we pipeline the two predicates, we should run the faster one first. In eddy, because of back pressure, the faster one will end up getting more tuples even though we don’t know which one is faster apriori.SELECT * FROM R WHERE s1() and s2();
but now assume that both predicates take the same amount of time but one has much higher selectivity. In a pipelined plan, we should put the higher selectivity operator first.SELECT * FROM R, S, T WHERE R.a = S.a and S.b = T.b
, we would like to perform the join with the smallest selectivity first. Eddy’s lottery scheduling will do this automatically without fancy selectivity estimation.Dynamically re-ordering joins can do better than any fixed order. For example, imagine the join Q(a, b) :- R(a), S(a, b), T(b)
with the following relations:
R S T
1 23 3
1 23 3
1 14 3
1 14 3
If we join the first two tuples of S with all of R and then the last two tuples of S with all of T, we do better than any fixed ordering of the joins.SELECT * FROM R, S, T WHERE R.a + S.b = T.c
, avoiding cross products, sorting into a sort merge join). In these cases, eddies isn’t as flexible.