Nested outer joins

Here is MySQL’s nested outer joins optimization cheat sheet:

  • Conversions:
    • RIGHT JOIN is converted to LEFT JOIN. FULL JOIN is not supported.
    • Outer joins are converted to inner joins when possible
  • Constraints on join order:

    • “Outer tables go first”
    • “No interleaving”
  • Table access rules:

    • “Inner” table access methods are constructed from parts of the ON condition. WHERE condition can’t be used to construct table accesses.
    • Parts of ON condition are checked as soon as possible
    • Parts of the WHERE condition
      – are not checked until we’ve found a row combination that matches the ON clause
      – are checked as soon as possible after that.

Or, in more detail:

Conversions

RIGHT JOIN to LEFT JOIN conversion is obvious:

  (t1 RIGHT JOIN t2 ON cond) = (t2 LEFT JOIN t1 ON cond)

Conversion from outer to inner join is possible when the result of inner join will be the same. It will be the same if the row combination with NULL-complimented row will not pass the WHERE clause. For example, if we look at the query

  t1 LEFT JOIN t2 ON some_cond WHERE t2.a=t1.b

we’ll see that a row with t2.a IS NULL will not satisfy the WHERE condition. Hence, this outer join can be converted to inner.

Constraints on join order

Outer tables go first
any outer table used in the outer join’s ON clause must be before all of the inner tables.
No interleaving
tables contained within an outer join must form a continuous sequence in the join order. Interleaving with tables that are outside of the outer join is not allowed.

Table access rules

Now, this requires some explanation. MySQL’s nested-loops join code tries to check parts of the WHERE as soon as
possible. For example when a query

SELECT * FROM
  t1,t2, ...
WHERE
  t1.col1=c1 AND
  t2.col1=t1.col2 AND t2.col2=c3 AND
  ...

is executed using a join order of (t1, t2,…), it proceeds according to this kind of scenario:

Inner join swimlanes

We can see here that the the WHERE condition is split into parts that are checked “as early as possible”.

With outer joins is more complicated. We need to know if we’ll need to generate a NULL-complemented row combination. We won’t need to if there was a combination of inner tables that matched the ON (but not necessarily the WHERE) clause. The solution is to switch the WHERE parts checking on and off.

The best way to show it is with example: Suppose we have a query

 SELECT * FROM ... ot1 LEFT JOIN (it1, it2) ON somecond WHERE ...

and suppose the join order is (ot1, it1, it2, …). The execution will proceed in this manner:

outer join swimlanes

What’s visible there? When we start scanning table it1, we check only the ON condition. We can’t check the WHERE – we could iscard some it1’s row that is the only row that will match the ON condition, think there will be no matches, and erroneously generate the NULL-complimented row.

After we’ve found the match for the ON condition, we go back and check all parts of the WHERE we did not check because of the above mentioned reason.

After that, the execution proceeds as if this was an inner join, with ON merged into the WHERE clause.

The diagram also shows why we can’t use parts of the WHERE clause to create table acccess methods: because there are times when we can’t use parts of the WHERE for filtering. We always can use parts of the ON though.

Now it should be clear where all Table Access Rules came from.

2 Comments

  1. werutzb says:

    Hi!

    I want to improve my SQL experience.
    I red that many SQL books and want to
    read more about SQL for my occupation as mysql database manager.

    What can you recommend?

    Thanks,
    Werutz

    Like

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s