Working on subquery optimizations, got an idea how to speed up join execution a little. Read on.
Consider a query:
select * from t1, t2, t3 where t3.key=t1.col1 and t2.key=t1.col2
Suppose the join order is t1, t2, t3, i.e. the EXPLAIN is like
+-------+------+---------------+------+---------+--------------+-.. | table | type | possible_keys | key | key_len | ref | +-------+------+---------------+------+---------+--------------+-.. | t1 | ALL | NULL | NULL | NULL | NULL | | t2 | ref | key | key | 5 | test.t1.col1 | | t3 | ref | key | key | 5 | test.t1.col2 | +-------+------+---------------+------+---------+--------------+-..
The important property is that access to t3 is independent of access to t2. MySQL’s nested loops join algorithm will run this as in this swimlane diagram:
Here we assume that
- table t2 has 4 rows such that t2.key=t1.col1
- table t3 doesn’t have any rows such that t3.key=t1.col2
As soon as we make first index lookup in table t3 (the one with blue border), we see that there will be no matching row combinations for this row of t1. Nevertheless, MySQL executioner will proceed to examine different rows in table t2 (marked with red border). This is redundant and can be easily avoided.
The executioner part is easy: just maked the nested-loops join code to “jump back” in the cases like the illustrated. If we don’t find a match and table access/selection condition does not depend on the preceding table(s), then go back to the last table that the table access depends on. I don’t have the ready term for this, the working name is “short-cutting”.
The optimizer part is (as usual) more complicated. One way is to take the easy route: let the join optimizer (the part of code that chooses the join order) remain unaware of the short-cutting. Once the join order is produced, set up the executioner to perform short-cutting where appopriate.
The bad part is that join optimizer doesn’t account for short-cutting when comparing costs of various join orders, which may lead it to choose non-optimal join orders. However, taking short-cutting into account won’t be easy:
- First, we’ll need to generate selection conditions for the join orders we consider (in other words, figure out if the EXPLAIN will have “Using WHERE”). Our current way of doing this will probably be too expensive to be done for each considered join order.
- Second, we’ll need to get somewhere an estimate of how often short-cutting will occur. That is, we’ll need to know probability that for some *arbitrary* values of columns from preceding tables, table T will have no rows that would satisfy given condition COND(preceding_tables, T). This estimate is likely to end up being some while guess like “lets use
At the moment the easy route seems like the way to go.
2007-02-11: Fixed typos in EXPLAIN ouput
Perhaps another way to think of it is “pruning the execution tree” if you think of the nested-loop join as a depth-first search.
Maybe, I’m not a terminology guru. Thanks for feedback, I’ll keep the “pruning the execution tree” option for the time we’ll be settling with which terms to use.
Can you show us the optimised version of the initial SQL or does “The executioner part is easy: just maked the nested-loops join code to ‘jump back'” explain it and I just need to learn what the ‘executioner’ is?
Once you should look at WL#358 too. I think it’s an idea to speed up things too, and help client application developers. (My idea.)