SHOW EXPLAIN and skeletons in EXPLAIN's closet

I believe I’m nearly done with the SHOW EXPLAIN feature. The idea is that if you’ve got some long-running query $LONG_QUERY running in connection $LONG_QUERY_CONN, you should be able to create another connection, run SHOW EXPLAIN FOR $LONG_QUERY_CONN and see what query plan is being used to run the $LONG_QUERY.

How is this different from just running explain $LONG_QUERY? First, you don’t need to replicate the exact environment that $LONG_QUERY was run in: you don’t need to figure out what values it had for @@optimizer_switch and other settings, what were the contents of its temporary tables, if any, what did InnoDB told the optimizer about the table statistics, etc.

Another, indirect benefit is that we will now be able to produce query’s EXPLAIN at arbitrary point in time, which should make it possible to store it in the slow query log, or support EXPLAIN ANALYZE command.

To the uninformed, coding SHOW EXPLAIN feature may seem trivial: one will only need to walk the query plan and print it out. In reality, it is not as easy: first, you need to take into account that MySQL/MariaDB has a habit of optimizing parts of query plan lazily, also, it is eager to free parts of query plan that are no longer needed. Then, there are slight differences between data structures used by EXPLAINs and regular SELECTs.

It also turns out EXPLAIN had some skeletons in its closet. We have been testing SHOW EXPLAIN by comparing its output with EXPLAIN output, and I have already found two cases where EXPLAIN’s output doesn’t match the query execution. Which is scary, because I personally used to trust EXPLAIN, and everybody I worked with trusted it, too.

I’ve fixed (in MariaDB) one of the wrong EXPLAINs, MDEV-410, since it relates to subqueries and we have a lot of expertise in that code. The second one, MDEV-326, remains unfixed, because it’s in the sorting/grouping code and I’m afraid I can open a big can of worms if I touch it.

Please try your subqueries on MariaDB

MariaDB 5.3 is now GA, and MariaDB 5.5 is RC. One of the primary features in these releases is all-round coverage with subquery optimizations. Practically every kind of subquery available in SQL has got some new optimization.

We do a lot of testing, so these new optimizations should be now reasonably stable. What is missing is performance testing with real-world queries on real-world data. I expect most of the time you will see a speedup, however, there can also be cases where the new version will be slower. New optimizations make query plan search space much bigger, this means the new optimizer will have lots of room to make errors where previously was none.

Back at MySQL Ab, I could use bugs/support cases to do some analysis of how real-world queries are affected by the new optimizations.

Now, we don’t have access to that data anymore, and so are asking for help: If you’ve got some queries with subqueries, please try running them on the latest MariaDB 5.3 Stable or MariaDB 5.5 RC and let us know of the results.

Statistics counters for Multi Range Read

MariaDB 5.3 has now three statistics counters for Multi Range Read optimization:

MariaDB [test]> show status like 'Handler_mrr%';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| Handler_mrr_extra_key_sorts   | 0     |
| Handler_mrr_extra_rowid_sorts | 0     |
| Handler_mrr_init              | 0     |
+-------------------------------+-------+
3 rows in set (0.08 sec)

I’ve just added the first two. The reason for having them is as follows: the point of MRR is to provide speedup over regular execution by doing reads in disk order. In order to make reads in disk order, MRR needs buffer space where it accumulates and sorts read requests. If there are too many read requests to fit into the buffer, MRR will make multiple accumulate-sort-read passes.

Doing multiple passes allows MRR to operate when having limited buffer space, but the speedup will be not as great as with one big disk-ordered read sweep. The purpose of Handler_mrr_extra_key_sorts and Handler_mrr_extra_rowid_sorts is to count the additional accumulate-sort-read passes, so you’re able to tell if you will benefit from increasing your @@mrr_buffer_size and @@join_buffer_size settings.

There are two counters, _extra_key_sorts and _extra_rowid_sorts, because MariaDB has two places where it will do sorting:

  1. sort rowids before reading table records
  2. sort key values before making a bunch of index lookups

MRR code will try to distribute buffer space between them in an optimal way. The decision is a guess based on the available statistics, and can be wrong. Having both counters will allow us to check how the guess will work in practice.

p.s. if you could not make any sense of anything above, try reading Multi Range Read page in our knowlegebase. We have just put there a hopefully-readable explanation of what MRR is.

On complex optimizations and optimizer hints

In an ideal world, you should never need to use optimizer hints. In the real world, hints are necessary. We needed hints even before we’ve made a release. The first requests came from our Quality engineer who complained about it being difficult to hit particular join orders while using a particular subquery strategy.

So, why not add the needed hints? MySQL already has the STRAIGHT_JOIN and FORCE INDEX, you can add more as required. The problem is that once you have transformations that change the query to be far enough from the original SQL, you don’t have a “natural” place or syntax for hints anymore.

For example, if you have a join

SELECT ... FROM table1, table2, table3 WHERE ....

and want a join order of (table2, table3, table1), you can write it as

SELECT ... FROM table2 STRAIGHT_JOIN table3 STRAIGHT_JOIN table1

and that gives the optimizer the join order while still looking like SQL. Now, how do you force the same join order when the query looks like this:

SELECT ...
FROM table1
WHERE
  table1.column IN (SELECT table2.col1
                    FROM table2
                    WHERE
                      table2.col2 IN (SELECT table3.col3
                                      FROM table3)

The obvious answers are

  1. you have to put table2, table3, table1 together somewhere
  2. your hint syntax won’t be SQL

And since you now need to invent the whole new non-SQL syntax to specify query plans, the task becomes just too hard. It’s difficult to design a future-proof compact syntax for specifying query plans. My not-really-solution to the problem was to

  • Add a @@debug_optimizer_prefer_join_prefix system variable. That way, I avoided messing with the parser
  • Do it in a never-to-be-pushed branch of MariaDB, so this solution is not exposed to the same rigor as everything that goes into release

As a result, you can make the optimizer to “strongly prefer” some particular join prefixes. The preferences are applied to each select, though. If you have a UNION and want to have one preference for on branch of it, and another preference for the other, there is no way to achieve that.

Given this kind of limitations, I think the patch is useful for development or debugging, but not for production use. Everything (documentation, link to launchpad tree) are here, at the mariadb-53-optimizer-debugging page.

MariaDB 5.3: documentation updated

With MariaDB 5.3.3 Release Candidate out of the door, I could give some attention to documentation and improve the master What is MariaDB 5.3 page and pages linked from it.

I think the part about subquery optimizations should be fairly easy to read now, and our tech writer Daniel Bartholomew also did a pass over it. If you have a picture-type mind like I do, there is plenty of pictures, including the Subquery optimizations map.

For now, my goal was to just have a descriptions of all optimizations in place. We have also done substantial amount of benchmarking, but that data still waits to be processed and published.

Attempt to split @@optimizer_switch

Three years ago MySQL has got @@optimizer_switch variable. It was introduced in MySQL 5.1.34, that is, we needed it so much that we’ve added it into the stable release.

In a nutshell,@@optimizer_switch held a comma-separated list of optimizer controls:

mysq> > select @@optimizer_switch;
+------------------------------------------------------------------------------------------+
| @@optimizer_switch                                                                       |
+------------------------------------------------------------------------------------------+
| index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on|
+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

One could set all settings at once:

mysql> set optimizer_switch='index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=off';
Query OK, 0 rows affected (0.01 sec)

or set individual flags:

mysql> set optimizer_switch='index_merge_sort_union=off';
Query OK, 0 rows affected (0.01 sec)

The reasons for putting all optimizer parameters into one variable were:

  • make them distinct from all other settings
  • make it possible to run “SET optimizer_switch=default” and reset the optimizer to its default settings (which are not necessarily all “on”)

@@optimizer_switch solution allowed all that, and was very useful in optimizer development and troubleshooting. However, it is becoming a victim of its own success. In current development version of MariaDB @@optimizer_switch has 26 flags, and we’re thinking of adding at least two more before MariaDB 5.3 release. It now looks like this:

MariaDB [test]&> select @@optimizer_switch;
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,
index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,
derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=off,
in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,
subquery_cache=on,mrr=on,mrr_cost_based=off,mrr_sort_keys=on,outer_join_with_cache=off,
semijoin_with_cache=off,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,
optimize_join_buffer_size=on,table_elimination=on
1 row in set (0.00 sec)

It is rather difficult to check the value of some particular flag. Also, there is no easy way to get all settings for subquery optimization flags (other than knowing them by heart and checking each of them).

We at MariaDB are having a discussion about switching from single @@optimizer_switch variable to a set of variables like this:

optimizer.index_merge=on
optimizer.index_merge.union=on
optimizer.index_merge.sort_union=on
optimizer.index_merge.intersection=on
optimizer.index_merge.sort_intersection=off
optimizer.index_condition_pushdown=on
optimizer.join_cache.bka=on
optimizer.join_cache.hashed=on
optimizer.join_cache.incremental=on
optimizer.join_cache.optimize_buffer_size=on
optimizer.join_cache.outer_join=off
optimizer.join_cache.semijoin=off
optimizer.mrr=on
optimizer.mrr.cost_based=off
optimizer.mrr.sort_keys=on
optimizer.semijoin=on
optimizer.semijoin.firstmatch=on
optimizer.semijoin.loosescan=on
optimizer.subquery.cache=on
optimizer.subquery.in_to_exists=on
optimizer.subquery.materialization=off
optimizer.subquery.partial_match_rowid_merge=on
optimizer.subquery.partial_match_table_scan=on
optimizer.table_elimination=on

There are various opinions about how we could make the switch while remaining compatible with the current approach, whether we should switch at all/the optimizer_switch variable/all variables to dotted.name.notation, etc.
If you have an opinion, now it is a good time to voice it at maria-developers@lists.launchpad.net.

MariaDB at FOSDEM: slides, booth questions, etc.

At FOSDEM 2010, MariaDB was represented by Kristian Nielsen and me. Together with Vlad Kolesnikov we’ve manned the joint MariaDB & PBXT stand. I’ve had the main track talk titled “MariaDB: extra features that make it a better branch of MySQL” (slides), and
Kristian had a more general talk titled “Beyond MySQL GA: patches, storage engines, forks, and pre-releases” (slides) in MySQL devroom.

There were no other MySQL-related main track talks, so overall, MySQL ecosystem was represented by one devroom, one stand, and one talk. This is exactly as much as a certain other open source database had, with the exception that our stand didn’t have anything to match the blue plush elephants and pencils and mugs.

I’ve taken notes about questions asked at the stand, here they are:

  • The most common question: So how do I upgrade from MySQL to MariaDB? People are pleased with the answer.
  • A complaint about that it is hard to get data out of a corrupted innodb database, or do anything about a corrupted database (as I understood the visitor wanted a kind of CHECK/REPAIR TABLE[SPACE] command for InnoDB)
  • A request to allow to start/stop the slow query log without restarting the server. The idea is that one doesn’t want to have the slow query log turned on at all times but wants to be able to examine performance problems then he sees them, hence the need to start/stop logging without restarting the server. (UPDATE: it is already possible to do in MySQL/MariaDB. See comments for details)
  • A complaint about poor performance of stored procedures. Unfortunately, it was not feasible to figure out what exactly was being slow, stored procedure runtime itself, or cursor implementation, or queries that the stored procedure executed, or something else. I was only able to answer that MariaDB doesn’t have any enhancements in stored procedure handling at this point.
  • Does MariaDB have any improvements for VIEW handling? The complaint was about poor performance because MySQL “recalculates VIEWs every time”. I’m not sure what is really the problem here, either it is lack of Materialized VIEWs (with indexes on them?), or known poor optimization in case where VIEW is not mergeable
  • Do we have any plans to support transactional DDL statements? (no)
  • Does MariaDB have any improvements in query cache? (no)
  • Have we in MariaDB fixed a certain MySQL bug (I’ve lost the bug#) with triggers? (no)
  • Is it possible to have indexes on MariaDB’s virtual columns? (only when the column is “stored”. indexing a non-stored virtual column would essentially give one functional indexes, and MariaDB doesn’t support them at the moment)

Due to being one of the three people manning the stand I’ve missed most of the MySQL devroom. It would be nice to learn if there were any interesting discussions there.