Sheeri wrote a blog post that claims that “IN Subqueries in MySQL 5.6 Are Optimized Away” and uses that as a basis to conclude that subquery optimizations in MySQL 5.6 are superior to MariaDB’s.
The claim is incorrect (and so is the conclusion). As a person who has coded both of the mentioned FirstMatch and semi-join materialization, I think I need to write about this.
- “So MariaDB recognizes the subquery and optimizes it. But it is still optimized as a subquery”
- “In MySQL 5.6, the subquery is actually optimized away”
The first statement is somewhat true. The second one is not. I’ll try to explain. The example subquery Sheeri used was:
SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)
Its meaning is “find films that have actors in table film_actor”. It is not possible to “optimize away” the subquery here. Not more than it’s possible to take the expression “2*3” and optimize away the “*3” part of it. The subquery affects the result of the whole query. You can’t remove it.
What the optimizer (both in MariaDB and in MySQL 5.6) does here is to convert the subquery into a semi-join. Semi-join is basically a “subquery in the WHERE clause” (read the link for more details), and it gives the optimizer more possible choices.
Semi-join can be seen in EXPLAIN EXTENDED. In both MariaDB and MySQL, one can see:
... select `film`.`title` AS `title` from `sakila`.`film` semi join (`sakila`.`film_actor`) ...
But what about different query plans? They do not show superiority of one optimizer over the other. As indicated in documentation, MariaDB supports the FirstMatch strategy that was the chosen by MySQL 5.6. Also, MySQL 5.6 supports semi-join materialization strategy that was picked by MariaDB. I suspect, different query plans were chosen because MariaDB and MySQL use different cost formulas. It is not possible to tell whose formulas are better, because both query plans finish in 0.01 seconds, and the tables are very small.
Which means, this example doesn’t allow one to conclude that MySQL’s subquery optimizations are superior to MariaDB (or vice versa) QED.
Addendum. Query plan used by MySQL will read *exactly* the same rows from the tables that MySQL 5.1/5.5 (which have no subquery optimizations) would read. The best you can deduce here is that MySQL 5.6’s subquery optimizations are not making things worse.
Is it necessary to write a blog post directing a point to me? If not, there’s no reason to have my name in the title, that makes it extremely personal.
Also, my conclusion is not that MySQL is better, just that it optimized the query away, as per the title. If you read the comments, you’d see that I did an EXPLAIN EXTENDED and MariaDB 5.5 rewrites the query as a semijoin, asme as MySQL 5.6, but the semijoins are obviously handled differently.
I still think it’s great that the Oracle team is able to simplify the query like this. I don’t think a personal attack is warranted (you could just have taken my name out of the title and referenced my post), and as usual, whether or not the performance is better depends on the workload and particular subqueries you use.
I do think Oracle’s work is promising. (MariaDB’s work is too, which I’ve said before, and it’s why we are updating to MariaDB 5.5 for our applications that make heavy use of subqueries).
And of course, I haven’t played with Maria 10 yet either. I’ve been researching MySQL 5.6 for the OurSQL podcast because it’s rumored that there’s a GA coming soon (Rob Young said “early 2013”), and I thought this feature was cool.
It would be an unfair comparison for me to say “MySQL 5.6 does it better than MariaDB 5.5”, I would have to compare MySQL 5.6 to MariaDB 10 to be fair. I also threw in MySQL 5.1. The point was to show how it’s evolving. If I’d done a blog post about MySQL 5.5 vs. MySQL 5.6 and saying it optimizes to a semijoin, people would complain and say MariaDB does that too. So I tried to be fair and put the MariaDB stuff in, and apparently that caused a problem too.
Great post Sergey, very informative. It would be great to have a comparison of all the optimizer features between MariaDB 5.5 and MySQL 5.6…
Sergey, that’s a really interesting post. Out of curiosity have you tried this with say a million records in each table?
I would normally not put people’s names, but this time the published statements were… wrong.
> my conclusion is not that MySQL is better,
but reading your post leaves that impression. And it’s not only for me, others had such impression, also.
> It would be an unfair comparison for me to say â€œMySQL 5.6 does it better than MariaDB 5.5â€³,
I don’t think so (PeterZ actually did a comparison, will post a link below), *as long as it really does it better in a meaningful way*.
> So I tried to be fair and put the MariaDB stuff in, and apparently that caused a problem too.
The problem was not with mentioning MariaDB. I’m also perfectly ok with anybody posting that “MariaDB is worse/slower” as long as *that has some relationship to reality*.
The problem was with very wrong statements. As I have pointed out, the subquery in question is not “optimized away” by either of the optimizers. They basically do the same conversion. They both consider both query plans.
The query plan you’ve got in MySQL 5.6 is to do essentially the same what MySQL 5.1
did. Yet, reading your blog, I get an impression that getting that query plan makes 5.6 totally
amazing. If somebody uninformed is reading this, they are getting a very wrong idea. This is why I reacted this way.
@madmax, yes it would be! Unfortunately, there doesn’t seem to be anybody who would make comparisons regularly. Oracle’s closeness doesn’t help here. Black-box testing (or reading the source code) would take a lot of time.
We at MariaDB try to maintain a basic feature comparison matrix here:
there was also a comparison made by Percona in April last year:
but it is almost a year old now.
I have not. The problem with “scaling up” that query is that
* the subquery has no WHERE clause
* the query’s WHERE clause only has the subquery.
Current dataset is such that there are 1000 films. 997 of them have actors, so the query returns 997 rows. If there were 1M films, the query would have returned 997K rows. I don’t think that’s very realistic.
Typically there are selective predicates, outside the subquery, inside the subquery, or both inside and outside.
It’s interesting to try with those, but this adds a lot of choices one needs to make. I would really love if somebody who’s neither Oracle nor at MariaDB took a reasonably big dataset, realistic queries, and reported how they worked.