MariaDB 5.3 is now GA, and MariaDB 5.5 is RC. One of the primary features in these releases is all-round coverage with subquery optimizations. Practically every kind of subquery available in SQL has got some new optimization.
We do a lot of testing, so these new optimizations should be now reasonably stable. What is missing is performance testing with real-world queries on real-world data. I expect most of the time you will see a speedup, however, there can also be cases where the new version will be slower. New optimizations make query plan search space much bigger, this means the new optimizer will have lots of room to make errors where previously was none.
Back at MySQL Ab, I could use bugs/support cases to do some analysis of how real-world queries are affected by the new optimizations.
Now, we don’t have access to that data anymore, and so are asking for help: If you’ve got some queries with subqueries, please try running them on the latest MariaDB 5.3 Stable or MariaDB 5.5 RC and let us know of the results.
You can find some of my real-world cases tests here: http://mysqlmaniac.com/2012/what-about-the-subqueries/
Results are quite promising, however I couldn’t see any improvements when testing queries with double subqueries or with many UNIONs.
LikeLike
@Przemek, thanks a lot! I and the rest of MariaDB team really appreciate getting feedback on our work.
Subquery cache should definitely work for double subqueries and UNIONs, I have no idea why you’re not seeing the speedup.
Perhaps, the dataset is such that the cache is not useful?
If you’re going to re-run the UNIONs/double-subquery cases, could you check the values of status variables subquery_cache_hit subquery_cache_miss before/after the query? (they show how subquery cache worked, see http://kb.askmonty.org/en/subquery-cache for details)
LikeLike
Ok, here is example query where subquery_cache made no difference in execution time for me when MySQL did choose the wrong indexes:
SELECT DISTINCT `page`.page_namespace AS page_namespace,
`page`.page_title AS page_title,
`page`.page_id AS page_id,
rev_user,
rev_user_text,
rev_comment,
rev_timestamp
FROM `revision` AS rev,
`page`
WHERE 1 = 1
AND `page`.page_namespace IN ( ‘0’, ‘0’ )
AND `page`.page_is_redirect = 0
AND `page`.page_id = rev.rev_page
AND rev.rev_timestamp = (SELECT Min(rev_aux.rev_timestamp)
FROM `revision` AS rev_aux
WHERE rev_aux.rev_page = rev.rev_page)
AND ‘Stryker117’ = (SELECT rev_user_text
FROM `revision`
WHERE `revision`.rev_page = page_id
ORDER BY `revision`.rev_timestamp ASC
LIMIT 1)
AND `page`.page_namespace NOT IN ( 1100 )
ORDER BY rev_timestamp DESC
LIMIT 0, 5;
EXPLAIN:
+——+——————–+———-+——-+————————+—————-+———+——————-+——-+———————————————————————+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+——+——————–+———-+——-+————————+—————-+———+——————-+——-+———————————————————————+
| 1 | PRIMARY | page | range | PRIMARY,name_title | name_title | 4 | NULL | 34382 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | PRIMARY | rev | ref | PRIMARY,page_timestamp | PRIMARY | 4 | halo.page.page_id | 2 | Using where |
| 3 | DEPENDENT SUBQUERY | revision | index | PRIMARY,page_timestamp | rev_timestamp | 14 | NULL | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | rev_aux | ref | PRIMARY,page_timestamp | page_timestamp | 4 | halo.rev.rev_page | 2 | Using index |
+——+——————–+———-+——-+————————+—————-+———+——————-+——-+———————————————————————+
When subquery_cache=off the status variables were of course equal to 0 after query run:
show status like ‘sub%’;
+———————+——-+
| Variable_name | Value |
+———————+——-+
| Subquery_cache_hit | 0 |
| Subquery_cache_miss | 0 |
+———————+——-+
and with subquery_cache=on they were:
+———————+——-+
| Variable_name | Value |
+———————+——-+
| Subquery_cache_hit | 9608 |
| Subquery_cache_miss | 710 |
+———————+——-+
However in both cases the execution time was about the same (~1 hour 36 min).
But after I forced the proper index usage:
+——+——————–+———-+——-+————————+————+———+——————-+——-+———————————————————————+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+——+——————–+———-+——-+————————+————+———+——————-+——-+———————————————————————+
| 1 | PRIMARY | page | range | PRIMARY,name_title | name_title | 4 | NULL | 34382 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | PRIMARY | rev | ref | PRIMARY,page_timestamp | PRIMARY | 4 | halo.page.page_id | 4 | Using where |
| 3 | DEPENDENT SUBQUERY | revision | ref | PRIMARY | PRIMARY | 4 | halo.page.page_id | 4 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | rev_aux | ref | PRIMARY | PRIMARY | 4 | halo.rev.rev_page | 4 | |
+——+——————–+———-+——-+————————+————+———+——————-+——-+———————————————————————+
Times were bit better with subquery_cache enabled indeed (Subquery_cache_hit/miss the same as before):
w/o subquery_cache:
2.88 sec (cold buffer pool)
1.22 sec (warm buffer pool)
with subquery_cache:
2.38 sec (cold buffer pool)
0.77 sec (warm buffer pool)
LikeLike