New feature in MariaDB 10.1: ANALYZE statement

Last week, yours truly has pushed a new feature into MariaDB 10.1 tree: ANALYZE statement.

The idea of this feature is to make it easy to compare query plan with query execution. ANALYZE statement will run the statement, and produce EXPLAIN-like output, where optimizer’s estimates are followed by numbers that were observed when running the query. The output looks like this:

analyze-stmt-new-columns.png

Here,

  • Next to rows there is r_rows column which shows how many records were read from the table.
  • Next to filtered there is r_filtered column which shows which fraction of records was left after the part of the WHERE condition attached to the table was checked.

I think this should explain the feature. If you want more details, please refer to the KB article ANALYZE statement. It also discusses the meaning of the above EXPLAIN output.

Technical details and further plans

ANALYZE currently uses its own counters. Counting is done for all queries, including non-ANALYZE queries. This should be okay (not have visible overhead) as long as counting just increments integer variables in the query plan, without doing any atomic operations or making syscalls.

The upside of this approach is that it’s now trivial to make Explain in the slow query log also print ANALYZE output. When a query runs slowly, you will be able to know where exactly the optimizer was wrong.

The downside is that getting more data will not be as easy. So far, the most requested numbers beyond r_rows and r_filtered were r_time(amount of time spent in reading the table) and r_io(amount of IO that we did on the table). Counting the amount of time that was spent while reading each row will impose CPU overhead, it is a known problem. Counting IO is just incrementing a counter, but it will require interaction between ANALYZE code and storage engine(s) code, which will add complexity.

There is PERFORMANCE_SCHEMA feature, where others have already spent a lot of effort to count wait time and IO. It’s tempting to reuse it. The problem is, P_S collects the wrong data. P_S counters are global, while ANALYZE needs to count IO for each table reference separately. Consider a self-join. From P_S point of view, it is reading from the same table. From ANALYZE point of view, it is reads from two different table references. I’m currently not sure whether ANALYZE should/could rely on PERFORMANCE_SCHEMA.

A totally different angle is that tabular EXPLAIN output doesn’t allow to show much data (for example, how many rows were there before/after GROUP BY?). Here the solution is clear, I think: support EXPLAIN FORMAT=JSON and then add ANALYZE FORMAT=JSON where we can provide lots of detail.

Optimizations for derived tables in MySQL 5.6 and MariaDB 5.5+

I had been involved with subquery optimizations fairly closely, but last week I was surprised to find out that MySQL 5.6 does not support derived table merging. This feature was among the subquery features in the abandoned MySQL 6.0. In MariaDB, it was finished and released as part of MariaDB 5.3/5.5. As for MySQL, neither MySQL 5.6, nor MySQL 5.7 has this feature.

So what is this “derived merge”? It’s simple to understand. When one writes complex queries, it is common to use FROM-clause subqueries as a way to structure the query:

select
  sum(o_totalprice)
from
  (select * from orders where o_orderpriority='1-URGENT') as high_prio_orders
where
  o_orderdate between '1995-01-01' and '1995-01-07'

MySQL optimizer processes this syntax very poorly. The basic problem is that FROM-subqueries are always materialized exactly as-specified. Conditions from outside the subquery are applied only after the materialization.

In our example, table orders has an index on o_orderdate, and there is a highly selective condition o_orderdate BETWEEN ... which one can use for reading through the index. But the condition is located outside the subquery, so it will not be used when reading the table. Instead, we will get the following plan:

+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 1505799 | Using where |
|  2 | DERIVED     | orders     | ALL  | NULL          | NULL | NULL    | NULL | 1505799 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+

The meaning of it is:

  1. Do a full table scan is on table `orders`. We expect to read 1.5M rows. Write rows that match o_orderpriority='1-URGENT' into a temporary table
  2. Read the temporary table back. Filter rows that match o_orderdate between ... and compute the query result

MySQL 5.6 has added some improvements to this (link to the manual). They are:

  • The temporary table is materialized as late as possible. This has no effect of the speed of our example query, but it may have an effect for more complex queries.
    • EXPLAIN also will not materialize the temporary table
  • The optimizer has an option to create and use an index on the temporary table.

However, the base problem of materializing FROM subquery before applying any other optimization still remains.

In MariaDB, EXPLAIN will be different:

+------+-------------+--------+-------+---------------+---------------+---------+------+------+------------------------------------+
| id   | select_type | table  | type  | possible_keys | key           | key_len | ref  | rows | Extra                              |
+------+-------------+--------+-------+---------------+---------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | orders | range | i_o_orderdate | i_o_orderdate | 4       | NULL | 4358 | Using index condition; Using where |
+------+-------------+--------+-------+---------------+---------------+---------+------+------+------------------------------------+

Note that we see only one line, and the table orders is accessed through an index on o_orderdate. Running EXPLAIN EXTENDED will show why:

Message: select sum(`dbt3sf1`.`orders`.`o_totalprice`) AS `sum(o_totalprice)` from `dbt3sf1`.`orders` where ((`dbt3sf1`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf1`.`orders`.`o_orderDATE` between '1995-01-01' and '1995-01-07'))

There is no FROM-clause subquery anymore. It has been merged into the upper select. This allowed the optimizer to avoid doing materialization, and also to use the condition and index on o_orderdate to construct a range access.

Query execution time for this particular example went down from 15 sec to 0.25 sec, but generally, the difference can be as big as your table is big.

Engineering behind EXPLAIN FORMAT=JSON (or lack thereof)

MySQL 5.6 has added support for EXPLAIN FORMAT=JSON. The basic use case for that feature is that one can look at the JSON output and see more details about the query plan. More advanced/specific use cases are difficult, though. The problem is, you can’t predict what EXPLAIN FORMAT=JSON will produce. There is no documentation or any kind of convention regarding the contents of JSON document that you will get.

To make sure I’m not missing something, I looked at MySQL Workbench. MySQL Workbench has a feature called Visual Explain. If you want to use, prepare to seeing this a lot:

visual-explain-error.png

In Workbench 6.1.4 you get it for (almost?) any query with subquery. In Workbench 6.1.6 (released last week), some subqueries work, but it’s still easy to hit a query whose EXPLAIN JSON output confuses workbench.

Looking at the source code, this seems to be just the start of it. The code in MySQL Server is not explicitly concerned with having output of EXPLAIN FORMAT=JSON conform to any convention. Workbench also has a rather ad-hoc “parser” that walks over JSON tree and has certain arbitrary expectations about what nodes should be in various parts of the JSON document. When these two meet, bugs are a certainty. I suspect the real fun will start after a few releases of the Server (fixing stuff and adding new features) and Workbench (trying to catch up with new server while supporting old ones).

My personal interest in all this is that we want to support EXPLAIN JSON in MariaDB. MariaDB optimizer has extra features, so we will have to extend EXPLAIN JSON. I was looking for a way to do it in a compatible way. However, current state of EXPLAIN JSON in MySQL doesn’t give one a chance.

An interesting case in ORDER BY LIMIT optimization

Recently, I was asked about an interesting case in ORDER BY … LIMIT optimization. Consider a table

create table tbl (
  ...
  KEY key1(col1, col2),
  PRIMARY KEY (pk)
) engine=InnoDB;

Consider queries like:

  select * from tbl where col1='foo' and col2=123 order by pk limit 1;
  select * from tbl where col1='bar' and col2=123 order by pk limit 1;

These run nearly instantly. But, if one combines these two queries with col1='foo' and col1='bar' into one query with col1 IN ('foo','bar'):

  select * from tbl where col1 IN ('foo','bar') and col2=123 order by pk limit 1;

then the query is be orders of magnitude slower than both of the queries with col1=const.

The first thing to note when doing investigation is to note that the table uses InnoDB engine, which has extended_keys feature. This means, the index

  KEY key1(col1, col2)

is actually

  KEY key1(col1, col2, pk)

Once you have that, and also you have col1='foo' AND col2=123 in the WHERE clause, the optimizer is able to see that index `key1` produces records ordered by the `pk` column, i.e. in the order required by the ORDER BY clause. This allows to satisfy the LIMIT 1 part by reading just one row.

Now, if we change col1='foo' into col1 IN('foo','bar'), we will still be able to use index `key1`, but the rows we read will not be ordered by `pk`. They will come in two ordered batches:

  'bar', 123, pkX
  'bar', 123, pkX+1
  'bar', 123, pkX+2
  ...
  'foo', 123, pkY
  'foo', 123, pkY+1
  'foo', 123, pkY+2

The query has ORDER BY pk LIMIT 1, but, since the rowset is not ordered by pk, the optimizer will have to read all of the rows, sort them, and find the row with the least value of `pk`.

Now, wouldn’t it be great if the optimizer was aware that the index scan returns two ordered batches? It would be able to read not more than #LIMIT rows from each batch. I can think of two possible execution strategies:

  1. Run something similar to index_merge strategy: start an index scan col1='foo' and an index scan on col1='bar'. Merge the two ordered streams until we’ve found #limit rows. This approach works well when you’re merging a few streams. If there are a lot of streams, the overhead of starting concurrent index scans will start to show up.
  2. Use the same index cursor to #LIMIT rows from the first batch, then from the second, and so forth. Merge these ordered streams using filesort’s merge pass or priority queue. This approach reads more rows than the first one, but we don’t have to create another index cursor.

Now, the question is whether this kind of queries is frequent enough to implement this optimization.

Comparing query optimizer features in MariaDB 10.0 and MySQL 5.6

MariaDB 10.0 had a stable release last month. It is a good time to take a look and see how it compares to the stable version of MySQL, MySQL 5.6 (as for Percona Server, it doesn’t have its own optimizer features).
Changelogs and release notes have all the details, but it’s difficult to see the big picture. So I went for diagrams, and the result is a short article titled What is the difference between MySQL and MariaDB query optimizers. It should give one a clue about what are the recent developments in query optimizers in MySQL world.

In case you’re interested in details about optimizer features in MariaDB 10.0, I’ve shared slides from a talk about MariaDB 10.0 query optimizer.

Pager script for shrinking EXPLAIN output

Everyone who works with MySQL (or MariaDB) query optimizer has to spend a lot of time looking at EXPLAIN outputs. You typically first look at the tabular form, because it is easier to read. You can immediately see what the join order is, what numbers of records will be read, etc:

MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey;
+------+-------------+----------+------+---------------+-------------+---------+----------------------------+--------+-------+
| id   | select_type | table    | type | possible_keys | key         | key_len | ref                        | rows   | Extra |
+------+-------------+----------+------+---------------+-------------+---------+----------------------------+--------+-------+
|    1 | SIMPLE      | customer | ALL  | PRIMARY       | NULL        | NULL    | NULL                       | 150303 |       |
|    1 | SIMPLE      | orders   | ref  | i_o_custkey   | i_o_custkey | 5       | dbt3sf1.customer.c_custkey |      7 |       |
+------+-------------+----------+------+---------------+-------------+---------+----------------------------+--------+-------+

The only problem is that it quickly gets too wide and doesn’t fit even on wide screens. To relieve the pain, I wrote the script that shrinks EXPLAIN output by removing spaces and less useful information. You set the script as mysql client pager command:

MariaDB [dbt3sf1]> \P shrink_explain.pl
PAGER set to 'shrink_explain.pl'

and then the above EXPLAIN becomes shorter:

MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey;
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|id|select_type|table   |type|possible_keys|key        |key_len|ref               |rows  |Extra|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|1 |SIMPLE     |customer|ALL |PRIMARY      |NULL       |NULL   |NULL              |150303|     |
|1 |SIMPLE     |orders  |ref |i_o_custkey  |i_o_custkey|5      |customer.c_custkey|7     |     |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+

The worst horizontal space hog is the possible_keys column:

MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey and
    -> o_orderkey in (select l_orderkey from lineitem where l_shipdate='1995-01-01');
+------+--------------------+----------+----------------+-----------------------------------------------------------------------------------------------+-------------+---------+----------------------------+--------+-------------+
| id   | select_type        | table    | type           | possible_keys                                                                                 | key         | key_len | ref                        | rows   | Extra       |
+------+--------------------+----------+----------------+-----------------------------------------------------------------------------------------------+-------------+---------+----------------------------+--------+-------------+
|    1 | PRIMARY            | customer | ALL            | PRIMARY                                                                                       | NULL        | NULL    | NULL                       | 150303 |             |
|    1 | PRIMARY            | orders   | ref            | i_o_custkey                                                                                   | i_o_custkey | 5       | dbt3sf1.customer.c_custkey |      7 | Using where |
|    2 | DEPENDENT SUBQUERY | lineitem | index_subquery | PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity,l_shipdate_partkey,l_shipdate_suppkey | PRIMARY     | 4       | func                       |      2 | Using where |
+------+--------------------+----------+----------------+-----------------------------------------------------------------------------------------------+-------------+---------+----------------------------+--------+-------------+

I have added an option to aggressively shrink the possible_keys column:

MariaDB [dbt3sf1]> \P shrink_explain.pl --shrink-possible-keys
PAGER set to 'shrink_explain.pl --shrink-possible-keys'
MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey and
         ->  o_orderkey in (select l_orderkey from lineitem where l_shipdate='1995-01-01');
+--+---------------+--------+--------------+-------------+-----------+-------+------------------+------+-----------+
|id|select_type    |table   |type          |possible_keys|key        |key_len|ref               |rows  |Extra      |
+--+---------------+--------+--------------+-------------+-----------+-------+------------------+------+-----------+
|1 |PRIMARY        |customer|ALL           |PRIMARY      |NULL       |NULL   |NULL              |150303|           |
|1 |PRIMARY        |orders  |ref           |i_o_custkey  |i_o_custkey|5      |customer.c_custkey|7     |Using where|
|2 |DEPENDENT SUBQ.|lineitem|index_subquery|PRIMARY,i_...|PRIMARY    |4      |func              |2     |Using where|
+--+---------------+--------+--------------+-------------+-----------+-------+------------------+------+-----------+

and there is an option to remove possible_keys altogether:

MariaDB [dbt3sf1]> \P shrink_explain.pl --remove-possible-keys
PAGER set to 'shrink_explain.pl --remove-possible-keys'
MariaDB [dbt3sf1]> explain select * from customer, orders where c_custkey= o_custkey and
         -> o_orderkey in (select l_orderkey from lineitem where l_shipdate='1995-01-01');
+--+---------------+--------+--------------+---+-----------+-------+------------------+------+-----------+
|id|select_type    |table   |type          |...|key        |key_len|ref               |rows  |Extra      |
+--+---------------+--------+--------------+---+-----------+-------+------------------+------+-----------+
|1 |PRIMARY        |customer|ALL           |...|NULL       |NULL   |NULL              |150303|           |
|1 |PRIMARY        |orders  |ref           |...|i_o_custkey|5      |customer.c_custkey|7     |Using where|
|2 |DEPENDENT SUBQ.|lineitem|index_subquery|...|PRIMARY    |4      |func              |2     |Using where|
+--+---------------+--------+--------------+---+-----------+-------+------------------+------+-----------+

The script can be downloaded here: shrink_explain.pl.

SHOW EXPLAIN in MariaDB 10.0 vs EXPLAIN FOR CONNECTION in MySQL 5.7

MariaDB 10.0 has SHOW EXPLAIN feature. It was coded by yours truly and first introduced about a year ago. Last release of MySQL 5.7 introduced EXPLAIN FOR CONNECTION, which looks rather similar to MariaDB’s SHOW EXPLAIN. I was wondering how these two compare.

The basics

The usage scenarios are similar. In both cases, you first need a connection id of a running query. It is typically obtained by running SHOW PROCESSLIST:

MariaDB [(none)]> show processlist;
+----+------+-----------+----------+---------+------+--------------+-----------------------------+----------+
| Id | User | Host      | db       | Command | Time | State        | Info                        | Progress |
+----+------+-----------+----------+---------+------+--------------+-----------------------------+----------+
|  2 | root | localhost | dbt3sf10 | Query   |    2 | Sending data | select count(*) from orders |    0.000 |
|  3 | root | localhost | NULL     | Query   |    0 | init         | show processlist            |    0.000 |
+----+------+-----------+----------+---------+------+--------------+-----------------------------+----------+

Then, in MariaDB, you run:

MariaDB [(none)]> show explain for 2;
+------+-------------+--------+-------+---------------+---------------+---------+------+----------+-------------+
| id   | select_type | table  | type  | possible_keys | key           | key_len | ref  | rows     | Extra       |
+------+-------------+--------+-------+---------------+---------------+---------+------+----------+-------------+
|    1 | SIMPLE      | orders | index | NULL          | i_o_orderdate | 4       | NULL | 14856637 | Using index |
+------+-------------+--------+-------+---------------+---------------+---------+------+----------+-------------+

and you get the EXPLAIN for the query that connection #2 is running. SHOW EXPLAIN always generates a warning, the warning has the text of the query that the EXPLAIN is for:

MariaDB [(none)]> show warnings;
+-------+------+-----------------------------+
| Level | Code | Message                     |
+-------+------+-----------------------------+
| Note  | 1003 | select count(*) from orders |
+-------+------+-----------------------------+

The idea behind this was that without the warning, it will be difficult to tell which query this EXPLAIN belongs to. It could be that the query you saw in SHOW PROCESSLIST has finished and another one has started.

SHOW EXPLAIN works for any query that EXPLAIN works for (starting from MariaDB 10.0.5 EXPLAIN UPDATE/DELETE/INSERT are supported). If you attempt to run SHOW EXPLAIN on a connection that is not running a statement that has EXPLAIN, you will get:

MariaDB [(none)]> show explain for 2;
ERROR 1933 (HY000): Target is not running an EXPLAINable command

Now, let’s take a look at MySQL’s EXPLAIN FOR CONNECTION:

MySQL [(none)]> explain for connection 1;
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | orders | ALL  | NULL          | NULL | NULL    | NULL | 922880 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+

Looks very similar to SHOW EXPLAIN, but there is no warning with query text. The only way out I can see is to run SHOW PROCESSLIST again, find your query there, and look at the “Time” column. If the value is big enough, this means that the query you see in SHOW PROCESSLIST output was already running when you ran SHOW EXPLAIN.

If the target connection is not running a query, you will get nothing

MySQL [(none)]> explain  for connection 1;
Query OK, 0 rows affected (0.00 sec)

More details

Unlike SHOW EXPLAIN, EXPLAIN FOR CONNECTION supports flags. That is, you can run EXPLAIN [EXTENDED|PARTITIONS|FORMAT=JSON] FOR CONNECTION . However, EXPLAIN EXTENDED will not print the warning with query after rewrites, and EXPLAIN FORMAT=JSON will not print attached_condition entries. I think, these are caused by limitations of EXPLAIN FOR CONNECTION code.

There are cases where EXPLAIN FOR CONNECTION will produce outputs like:

MySQL [(none)]> explain for connection 1;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Plan isn't ready yet |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------+

In some cases, a part of the plan is not ready:

MySQL [(none)]> explain for connection 1;
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows   | Extra                |
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------------+
|  1 | PRIMARY     | NULL     | NULL | NULL          | NULL | NULL    | NULL |   NULL | Plan isn't ready yet |
|  2 | SUBQUERY    | lineitem | ALL  | NULL          | NULL | NULL    | NULL | 974084 | NULL                 |
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------------+

SHOW EXPLAIN in MariaDB could produce similar partial query plans, but after release 10.0.5, there is always a full query plan. It would be interesting to discuss the reasons for this, but the discussion won’t fit into this blog post.

Another interesting question is whether MySQL 5.7’s EXPLAIN FOR CONNECTION allows for printing of query plan into slow query log. Before MariaDB 10.0.5, SHOW EXPLAIN code didn’t allow this. Changes in MariaDB 10.0.5 made printing EXPLAIN at the query end easy, and now MariaDB can save EXPLAINs in the Slow Query Log. Will MySQL 5.7 follow and also add such feature?

A chance for Cassandra Storage Engine

Short version: It looks like DataStax has released an official C++ driver for their new CQL native protocol. This makes it possible to update MariaDB’s Cassandra Storage Engine to work with all column families in Cassandra 1.2 and up.

Long version: In case the above didn’t make any sense to you, here’s some background:

  • Cassandra Storage Engine was developed against Cassandra 1.1
  • Cassandra 1.1 had a client-server protocol based on Thrift API. It had a number of limitations, the most important was lack of support for streaming. These translated into limitations in Cassandra storage engine, for example, I could not support secondary indexes.
  • Cassandra 1.2 was released in February, and it had a new, native CQL protocol. The protocol supported streaming, but there were no C/C++ drivers.
  • Besides that, Cassandra has changed its data model in version 1.2. After this change, Thrift API clients were screwed could only access column families defined with WITH COMPACT STORAGE attribute. This applied to Cassandra Storage Engine as well.

Now, with official C/C++ driver for CQL protocol, it should be possible to update Cassandra Storage Engine to work with new versions of Cassandra. It will take some effort to move from using Thrift API to using CQL, but the required amount of effort just went down a lot.

MySQL 5.6: no testcases for non-crashing bugs, either (and an incomplete bugfix)

I normally don’t write such posts, but this time I’ll do a little whining. I was looking at MySQL’s Bug#69581, which was also filed against MariaDB as MDEV-5102. With some effort (little for this particular bug), one can find out that Bug#69581’s counterpart in Oracle’s internal bug database is Bug#16862316, and the fix is in revision 5366 in MySQL 5.6.14.

There are two problems, though:

  • The fix doesn’t include a testcase. We know, Oracle doesn’t publish testcases for crashing bugs, with a justification that it’s for security reasons. However, this bug is not a crashing one. I’m wondering what was the justification for not including testcase for it?
  • Luckily, the bug entry at bugs.mysql.com has the testcase, so I was able to play with it. As a result, I have found that MySQL’s fix is incomplete, and filed another bug, Bug#70703.

I think, the poor bugfix made by Oracle is not a big deal. Although, this is not the first time we at MariaDB see a poor bugfix by Oracle (Can the forks keep up? Well, they would, if they could trust upstream’s patches and just copy them, without having to check whether the fixes are real fixes or band-aids).

The lack of testcases is a much bigger concern. Has Percona Server merged fix for MySQL Bug#69581 correctly? Well, they don’t have any features related to partitioned tables, so most likely, the answer is Yes. But won’t it be nicer if mysql-test test suite had a testcase for this, and they could check it by running tests? It’s not only about forks/branches. Everybody who changes MySQL/MariaDB/Percona source code benefits from mysql-test testsuite having a good test coverage.

Now, are there any volunteers who could run through MySQL 5.6 changelog and find out whether missing testcase for non-crashing bug was an isolated occurrence, or it is a new trend?

MariaDB 10.0: EXPLAIN in the slow query log

MariaDB can now print query’s EXPLAIN into the slow query log! The feature is based on SHOW EXPLAIN technology and its output looks like this:

# Time: 131017 21:26:02
# User@Host: root[root] @ localhost []
# Thread_id: 2  Schema: dbt3sf1  QC_hit: No
# Query_time: 5.535819  Lock_time: 0.000350  Rows_sent: 1  Rows_examined: 65633
## <explain>
##   id select_type     table   type    possible_keys   key     key_len ref     rows    Extra
##   1  SIMPLE  nation  ref     PRIMARY,n_name  n_name  26      const   1       Using where; Using index
##   1  SIMPLE  customer        ref     PRIMARY,i_c_nationkey   i_c_nationkey   5       dbt3sf1.nation.n_nationkey      25124   Using index
##   1  SIMPLE  orders  ref     i_o_custkey     i_o_custkey     5       dbt3sf1.customer.c_custkey      7       Using index
## </explain>
SET timestamp=1382030762;
select count(*) from customer, orders, nation where c_custkey=o_custkey and c_nationkey=n_nationkey and n_name='GERMANY';

By default, EXPLAIN is not printed to the slow query log, one needs to enable it manually like this (either from SQL or from my.cnf):

set log_slow_verbosity='explain';

Now, I’d like to find time to hack pt-query-digest to make use of EXPLAINs. For start, let it show the number of different query plans. Then, show query plans on the response time distribution… so that one can tell which ones were fast or slow… The stopping factor here is that pt-query-digest is 500KB of unfamiliar Perl code.