In last couple of weeks there has been a tide of ORDER/GROUP BY-related optimization bugs, where I was the fixer or the reviewer. This wasn’t an easy job because there is no sane description of how GROUP BY/ORDER BY handling is supposed to work.
To figure it out, I had to write an explanation of how it works. The first part is about ORDER BY. Hopefully there will be subsequent parts that will show how GROUP BY is related to ORDER BY and how it works.
Available means to produce ordered sequences
MySQL has two methods to produce ordered streams.
The first is to use a “range“, “ref” or “index” access method over an ordered index. For versions up to 5.1, those access methods naturally return records in the index order, so we get ordering for free (the exception is NDB engine which needs to do merge-sort when it gets data from several storage nodes). In MySQL 5.2, MyISAM and InnoDB have MultiRangeRead optimization which breaks the ordering. We have a draft of how to make it preserve the ordering, but at the moment MRR is simply disabled whenever ordering is required.
The second, catch-all method, is to use the filesort algorithm. In a nutshell, filesort() does quicksort on chunks of data that fit into its memory and then uses mergesort approach to merge the chunks. The amount of memory available to filesort() is controlled by @@sort_buffer_size variable. if the sorted data doesn’t fit into memory (i.e. there is more than one chunk), filesort uses a temporary file to store the chunks.
Source data for filesort() always comes from one table. If there is a need to sort data from several tables, MySQL will first collect the data into a temporary table and then invoke filesort() for that temporary table. I don’t know the true reason for this. Codewise, filesort() wants to pull its source data using something like source.get_next_record() function, while join and union runtime produce their using result.put_next_record()-type calls, so maybe the temporary table is there only to resolve this push/pull mismatch and will go away once we get decent cursors.
filesort() has two modes of operation:
- Mode 1: the sorted elements contain all required columns of the source table. The result of the sorting is a linear sequence of output tuples, there is no need to access the source table after the sort is done.
- Mode 2: sort <sort_key, rowid> pairs and produce a sequence of rowids which one can use to get source table’s rows in the required order (but this will be essentially hit the table in random order and is not very fast)
Mode 1 is used whenever possible. Mode is used when mode1 is not applicable. This is the case when the sorted tuples have blobs or variable-length columns (TODO: check w/ evgen). Unfortunately, the EXPLAIN output provides no clue about which mode is used, so you’ll have to manually check for blobs in the output columns list.
Executing join to produce ordered stream
At the moment MySQL has three ways to run a join and produce ordered output:
|Use index-based access method that produces ordered output||no mention of filesort|
|Use filesort() on 1st non-constant table||“Using filesort” in the first row|
|Put join result into a temporary table and use filesort() on it||“Using temporary; Using filesort” in the first row|
Now I’ll cover those three methods in more detail. The first method can be used when the first non-const table in the join order has an index that matches the ORDER BY list. In this case we can use the index, and the ordering will “spread over” other tables onto the output. Here is a swimlane diagram of this process, where different columns represent different values of the ORDER BY expression:
This method is preferred over the other two as it requires no additional sorting steps.
The second method can be used when all ORDER BY elements refer to the first table in the join order. In this case, we can filesort() the first table and then proceed to execute the join:
Here filesort() may operate either in Mode 1 or in Mode 2. One may wonder why this is limited to doing filesort() after the first table. After all, we could do it after the second table as well – produce (tbl1, tbl2) record combinations, put them into temporary table, sort, etc. The expectation is perfectly reasonable but alas, MySQL will not even consider such query plans.
The last, the catch-all method is to write the entire join output into the temporary table and then invoke filesort:
I have an easier time recalling those three strategies when there are pictures, hopefully they’ll help you too. That’s all for today, in next posts I’ll cover the topics of how ordering affects join optimization and interoperates with GROUP BY.
great article, interesting stuff!
“Hopefully there will be subsequent parts that will show how GROUP BY is related to ORDER BY and how it works.”
Yeah I sure hope so. I’m curious…the relationship between GROUP BY and ORDER BY you are referring too, is that intrinsical? Or is it caused by MySQL’s non-standard commitment to return the resulset according to the order implied by the GROUP BY clause (unless an explicit ORDER BY is present)
Another thing I’m curious about is, would it from the Server development’s point of view be more or less complex if MySQL would not commit itself to returning the set in the order implied by the GROUP BY clause?
I have a question on filesort and EXPLAIN. You say ‘Use filesort() on 1st non-constant table -> “Using filesort” in the first row’. So even if the first table in the JOIN is a ‘const’ or ‘system’, the “Using filesort” will appear there, though the filesort may actually happen in a later table?
I’m having trouble finding a query that will produce such an EXPLAIN plan, so I hope this isn’t an obvious question.
> Iâ€™m curiousâ€¦the relationship between GROUP BY and ORDER BY you are referring too, is that intrinsical? Or is it caused by MySQLâ€™s non-standard commitment to return the resulset according to the order implied by the GROUP BY clause.
Both yes and no. I’m going to have this question sorted out (for myself too 🙂 in the mentioned forthcoming posts.
> would it from the Server developmentâ€™s point of view be more or less complex if MySQL would not commit itself to returning the set in the order implied by the GROUP BY clause?
It would be possible to skip the sorting step sometimes, but as long as we use current apporaches to handling GROUP BY, there will be close relationship between GROUP BY and ORDER BY processing. Again, I hope to cover this in the next posts.
> You say â€˜Use filesort() on 1st non-constant table -> â€œUsing filesortâ€ in the first rowâ€™. So even if the first table in the JOIN is a â€˜constâ€™ or â€™systemâ€™, the â€œUsing filesortâ€ will appear there, though the filesort may actually happen in a later table?
Yes (and yes, this is counterintuitive and I hope we’ll fix it at some point)
> Iâ€™m having trouble finding a query that will produce such an EXPLAIN plan, so I hope this isnâ€™t an obvious question.
Here is an example:
mysql> explain select * from const_tbl, t1, t2 where t1.b=const_tbl.a and t2.a=t1.a order by t1.a;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | const_tbl | system | NULL | NULL | NULL | NULL | 1 | Using filesort |
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t2 | ref | a | a | 5 | test4.t1.a | 11 | Using where |
3 rows in set (0.01 sec)
(note to self: make the comment input box here better, at the moment it looks like it was designed for SMS messages:)
Thanks Sergey. I’ll add this query as a test case to the visual explain tool. This rule shouldn’t be hard to implement.
I wonder why MRR is disabled when ordering is required. I guess getting rows using MRR then filesort() may perform better than disabling MRR in some cases, especially for remote storage engines like NDB, because filesourt() could be faster than round trips between indexes.
BTW, can I cite these your pictures and introduce in my blog to Japanese people? They’re really nice and intuitive 😉
With regards to MRR and ordering:
* MRR/NDB is capable of producing ordered output (it will sort if asked to).
* MyISAM/Maria/Falcon/InnoDB-MRR are incapable of producing records in any particular order. I agree that sometimes it could be faster to use MRR+sort (especially since we don\’t need to sort the complete set, we\’ll need to only restore index ordering within each of the MRR passes). That\’s just not implemented. We\’ve decided to be conservative and disable MRR (other, more agressive option was to force sort if MRR is used)
you\’re welcome to cite/copy pictures.
Thank you for your reply. I’ll copy your picture soon 🙂
I’m looking forward to that MRR+sort will be implemented.
Another thing what I want you to consider to implement in the future is an optimization for LIMIT clause. Currently, LIMIT will take effect after all JOIN and filesort is done. It’s okay for the third example above. However, building a whole JOIN result might be unnecessary for first and second example. If LIMIT is evaluated earlier, roundtrips can be reduced dramatically in some cases.
Suppose that the query has “LIMIT 3” clause. For the first example, MySQL optimizer executes the query like below:
o Fetch all matching rows from tbl1 in index order.
o Fetch all matching rows from tbl2 and join.
o Fetch all matching rows from tbl3 and join.
o Limit rows and returns 3 rows to the client.
If MySQL optimizer has an alternative plan like below, it looks more efficient than now:
o Fetch the first row from tbl1 in index order.
o Fetch matching rows from tbl2 and join.
o Fetch matching rows from tbl3 and join.
o At this point the number of rows is less than 3, then continue.
o Fetch the next row from tbl1 in index order.
o Fetch matching rows from tbl2 and join.
o Fetch matching rows from tbl3 and join.
o At this point the number of rows is greater than or equal to 3, then returns 3 rows to the client.
I see many web applications use LIMIT clause with sort. Having a query plan like this may improve query performance in certain cases IMHO 🙂
How is it going? 🙂
I might completely have misunderstood how NLJ works. It might be opposing. My comment below describes how it works now, doesn’t it? I had a wrong idea like “it firstly scans the driving table. then it fetches the following tables in turn” The my previous wrong idea is very much like BKA, isn’t it? BKA joins fetches all satisfied rows to the WHERE clauns from the driving table firstly, then fetches matching rows from the following table. Is this correct?