IN/=ANY Subqueries NULL woes

NULL values are suprisingly hazardous for MySQL’s way of optimizing IN subqueries. MySQL takes the ‘=’ from ‘=ANY’ and tries to push it down into the subquery, essentially changing IN subquery into EXISTS subquery. That is, a subquery like

outer_expr IN (SELECT inner_expr FROM ... WHERE subq_where)

is converted to:

EXISTS (SELECT 1 FROM ... WHERE subq_where AND outer_expr=inner_expr)

and then MySQL can use the pushed-down equality to limit number of rows it has to look through when running the subquery. This “pushdown” works as long as outer_expr and inner_expr can’t be NULL or you don’t care whether subquery result is NULL or FALSE (MySQL figures that you don’t care if the subquery is a part of OR or AND expression in the WHERE clause).When you start caring about producing correct NULL or FALSE results, problems start coming from left and right, literally:

NULL problem in the right part

Suppose outer_expr is not a NULL value but the subquery doesn’t produce a row
such that outer_expr=inner_expr.
Then “non_null_outer_expr IN (SELECT …)” evaluates to

  • FALSE, if the SELECT produces only non-null values, or produces nothing
  • NULL, if the SELECT produces (among others) a row with inner_expr IS NULL.

The “look for rows with inner_expr=outer_expr” approach is not valid anymore. You have to use a more complicated “look for rows with inner_expr=outer_expr, and if nothing is found, look for rows with inner_expr IS NULL”. Roughly speaking, the subquery can be converted to:

EXISTS (SELECT 1 FROM ... WHERE subq_where AND
(outer_expr=inner_expr OR
inner_expr IS NULL))

The need to evaluate this is the reason why MySQL has ‘ref_or_null’ access method:

mysql> EXPLAIN
-> SELECT outer_expr IN (SELECT t2.maybe_null_key
->                       FROM t2, t3 WHERE ...)
-> FROM t1;
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: t1
...
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: t2
type: ref_or_null
possible_keys: maybe_null_key
key: maybe_null_key
key_len: 5
ref: func
rows: 2
Extra: Using where; Using index
...

Subqery-specific access methods, unique_subquery and index_subquery, too have or-null variants. They are not visible in EXPLAIN output, you need to use EXPLAIN EXTENDED and look through the produced warnings:

mysql> EXPLAIN EXTENDED
-> SELECT outer_expr IN (SELECT maybe_null_key FROM t2) FROM t1;
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: t1
...
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: t2
type: index_subquery
possible_keys: maybe_null_key
key: maybe_null_key
key_len: 5
ref: func
rows: 2
Extra: Using index
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select (`test`.`t1`.`outer_expr`,
(((`test`.`t1`.`outer_expr`) in t2 on
maybe_null_key checking NULL))) AS `outer_expr IN (SELECT
maybe_null_key FROM t2)` from `test`.`t1`

The “OR … IS NULL” makes things slightly more complicated (and some optimizations within the subquery become impossible), but generally this is tolerable.

NULL hazard in the left part

The situation is much worse when you get NULL for outer_expr. According to NULL handling rules, NULL IN (SELECT inner_expr …) should evaluate to

  • FALSE, if the SELECT produces no rows.
  • NULL, if the SELECT produces any rows.

We’ll need to be able to check if the SELECT has produced any rows at all, so we can’t add “outer_expr = inner_expr” into the subquery. This is a problem, because lots of real world subqueries become very slow if we don’t push the equality down. At the moment, MySQL’s choice is to be fast rather than correct, and you may get FALSE where you should have gotten NULL. There seems to be no easy solution – essentially you need to have two different ways to execute the subquery depending on what is the value of left_expr is.

Me, Igor and Evgen are working on a fix. Hopefully I’ll be able post a resolution update soon.

Partitioning optimizations documentation is available

It’s been nine months since partition pruning code has been pushed into MySQL 5.1. Another available optimization, partition selection, has been in the main tree for even longer. There haven’t been any bugs reported for some time, so the code should be reasonably stable now.

And if you’re interested in what’s under the hood, a rather detailed description of partitioning optimizations is now available here:

http://dev.mysql.com/doc/internals/en/partitioning-optimizations.html

It is a part of the internals manual, but I tried to write it so it doesn’t
require any knowledge of MySQL source code. (and don’t be scared away by my English – the text in the manual has passed the scrutiny of the documentation team 🙂

Hello World!

Hi! My name is Sergey Petrunia and I’m one of the developers in MySQL Query Optimizer team. If you don’t have a habit of leafing through the commits list, you probably don’t hear much about what we’re doing, and we don’t have your feedback on it.
This blog is my attempt to rectify this situation a bit. Let’s see if it will succeed.