Here is MySQL’s nested outer joins optimization cheat sheet:
- RIGHT JOIN is converted to LEFT JOIN. FULL JOIN is not supported.
- Outer joins are converted to inner joins when possible
Constraints on join order:
- “Outer tables go first”
- “No interleaving”
Table access rules:
- “Inner” table access methods are constructed from parts of the ON condition. WHERE condition can’t be used to construct table accesses.
- Parts of ON condition are checked as soon as possible
- Parts of the WHERE condition
– are not checked until we’ve found a row combination that matches the ON clause
– are checked as soon as possible after that.
Or, in more detail:
RIGHT JOIN to LEFT JOIN conversion is obvious:
(t1 RIGHT JOIN t2 ON cond) = (t2 LEFT JOIN t1 ON cond)
Conversion from outer to inner join is possible when the result of inner join will be the same. It will be the same if the row combination with NULL-complimented row will not pass the WHERE clause. For example, if we look at the query
t1 LEFT JOIN t2 ON some_cond WHERE t2.a=t1.b
we’ll see that a row with t2.a IS NULL will not satisfy the WHERE condition. Hence, this outer join can be converted to inner.
Constraints on join order
- Outer tables go first
- any outer table used in the outer join’s ON clause must be before all of the inner tables.
- No interleaving
- tables contained within an outer join must form a continuous sequence in the join order. Interleaving with tables that are outside of the outer join is not allowed.
Table access rules
Now, this requires some explanation. MySQL’s nested-loops join code tries to check parts of the WHERE as soon as
possible. For example when a query
SELECT * FROM t1,t2, ... WHERE t1.col1=c1 AND t2.col1=t1.col2 AND t2.col2=c3 AND ...
is executed using a join order of (t1, t2,…), it proceeds according to this kind of scenario:
We can see here that the the WHERE condition is split into parts that are checked “as early as possible”.
With outer joins is more complicated. We need to know if we’ll need to generate a NULL-complemented row combination. We won’t need to if there was a combination of inner tables that matched the ON (but not necessarily the WHERE) clause. The solution is to switch the WHERE parts checking on and off.
The best way to show it is with example: Suppose we have a query
SELECT * FROM ... ot1 LEFT JOIN (it1, it2) ON somecond WHERE ...
and suppose the join order is (ot1, it1, it2, …). The execution will proceed in this manner:
What’s visible there? When we start scanning table it1, we check only the ON condition. We can’t check the WHERE – we could iscard some it1’s row that is the only row that will match the ON condition, think there will be no matches, and erroneously generate the NULL-complimented row.
After we’ve found the match for the ON condition, we go back and check all parts of the WHERE we did not check because of the above mentioned reason.
After that, the execution proceeds as if this was an inner join, with ON merged into the WHERE clause.
The diagram also shows why we can’t use parts of the WHERE clause to create table acccess methods: because there are times when we can’t use parts of the WHERE for filtering. We always can use parts of the ON though.
Now it should be clear where all Table Access Rules came from.