MariaDB 5.1 beta is almost out, so it’s time to cover some of its features. The biggest optimizer feature is MWL#17 Table Elimination.
The basic idea behind table elimination is that sometimes it is possible to resolve the query without even accessing some of the tables that the query refers to. One can invent many kinds of such cases, but in Table Elimination we targeted only a certain class of SQL constructs that one ends up writing when they are querying highly-normalized data.
The sample queries were drawn from “Anchor Modeling”, a database modeling technique which takes normalization to the extreme. The slides at anchor modeling website have an in-depth explanation of Anchor modeling and its merits, but the part that’s important for table elimination can be shown with an example.
Suppose the database stores information about actors, together with their names, birthdates, and ratings, where ratings can change over time:
According to anchor modeling, each attribute should go into its own table:
-- the 'anchor' table which has only synthetic primary key create table ac_anchor(AC_ID int primary key); -- a table for 'name' attribute: create table ac_name(AC_ID int, ACNAM_name char(N), primary key(AC_ID)); -- a table for 'birthdate' attribute: create table ac_dob(AC_ID int, ACDOB_birthdate date, primary key(AC_ID)); -- a table for 'rating' attribute, which is historized: create table ac_rating(AC_ID int, ACRAT_rating int, ACRAT_fromdate date, primary key(AC_ID, ACRAT_fromdate));
With this approach it becomes easy to add/change/remove attributes, but this comes at a cost of added complexity in querying the data: in order to answer the simplest, select-star question of displaying actors and their current ratings one has to write outer joins:
-- Display actors, with their names and current ratings select ac_anchor.AC_ID, ACNAM_Name, ACDOB_birthdate, ACRAT_rating from ac_anchor left join ac_name on ac_anchor.AC_ID=ac_name.AC_ID left join ac_dob on ac_anchor.AC_ID=ac_dob.AC_ID left join ac_rating on (ac_anchor.AC_ID=ac_rating.AC_ID and ac_rating.ACRAT_fromdate = (select max(sub.ACRAT_fromdate) from ac_rating sub where sub.AC_ID = ac_rating.AC_ID))
Apparently one won’t want to write such join every time they need to access actor’s properties, so they’ll create a view:
create view actors as select ac_anchor.AC_ID, ACNAM_Name, ACDOB_birthdate, ACRAT_rating from <see the select above>
which will allow one to access data as if it was stored in a regular way:
select ACRAT_rating from actors where ACNAM_name='Gary Oldman'
And this is where table elimination will be needed
Table elimination
The first thing the optimizer will do will be to merge the VIEW definition into the query and obtain
select ACRAT_rating from ac_anchor left join ac_name on ac_anchor.AC_ID=ac_name.AC_ID left join ac_dob on ac_anchor.AC_ID=ac_dob.AC_ID left join ac_rating on (ac_anchor.AC_ID=ac_rating.AC_ID and ac_rating.ACRAT_fromdate = (select max(sub.ACRAT_fromdate) from ac_rating sub where sub.AC_ID = ac_rating.AC_ID)) where ACNAM_name='Gary Oldman'
Now, it’s important to realize that the obtained query has a useless part (highlighted in magenta). Indeed,
left join ac_dob on ac_dob.AC_ID=...
will produce exactly one matching record:primary key(ac_dob.AC_ID)
guarantees that there will be at most one match for any value of ac_anchor.AC_ID,- and if there won’t be a match,
LEFT JOIN
will generate a NULL-complemented “row”,
- and we don’t care what the matching record is, as table ac_dob is not used anywhere else in the query
This means that the … left join ac_dob on … part can be removed from the query and this is what Table Elimination module does. The detection logic is rather smart, for example it would be able to remove the … left join ac_rating on … part as well, together with the subquery (in the above example it won’t be removed because ac_rating used in the selection list of the query). Table Elimination module can also handle nested outer joins and multi-table outer joins.
User interface
One can check that table elimination is working by looking at the output of EXPLAIN [EXTENDED] and not finding there the tables that were eliminated:
MySQL [test]> explain select ACRAT_rating from actors where ACNAM_name='Gary Oldman'; +----+--------------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ | 1 | PRIMARY | ac_anchor | index | PRIMARY | PRIMARY | 4 | NULL | 2 | Using index | | 1 | PRIMARY | ac_name | eq_ref | PRIMARY | PRIMARY | 4 | test.ac_anchor.AC_ID | 1 | Using where | | 1 | PRIMARY | ac_rating | ref | PRIMARY | PRIMARY | 4 | test.ac_anchor.AC_ID | 1 | | | 3 | DEPENDENT SUBQUERY | sub | ref | PRIMARY | PRIMARY | 4 | test.ac_rating.AC_ID | 1 | Using index | +----+--------------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ 4 rows in set (0.01 sec)
Note that ac_dob
table is not in the output. Now let’s try getting birthdate instead:
MySQL [test]> explain select ACDOB_birthdate from actors where ACNAM_name='Gary Oldman'; +----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ | 1 | PRIMARY | ac_anchor | index | PRIMARY | PRIMARY | 4 | NULL | 2 | Using index | | 1 | PRIMARY | ac_name | eq_ref | PRIMARY | PRIMARY | 4 | test.ac_anchor.AC_ID | 1 | Using where | | 1 | PRIMARY | ac_dob | eq_ref | PRIMARY | PRIMARY | 4 | test.ac_anchor.AC_ID | 1 | | +----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ 3 rows in set (0.01 sec)
The ac_dob table is there while ac_rating and the subquery are gone. Now, if we just want to check the name of the actor
MySQL [test]> explain select count(*) from actors where ACNAM_name='Gary Oldman'; +----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ | 1 | PRIMARY | ac_anchor | index | PRIMARY | PRIMARY | 4 | NULL | 2 | Using index | | 1 | PRIMARY | ac_name | eq_ref | PRIMARY | PRIMARY | 4 | test.ac_anchor.AC_ID | 1 | Using where | +----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+ 2 rows in set (0.01 sec)
then it will eliminate both ac_dob and ac_rating tables.
Removing tables from a query is not expected to make the query slower, and it does not cut off any optimization opportunities, so we’ve made table elimination unconditional and don’t plan on having any kind of query hints for it.
I wanted to add an @@optimizer_switch
flag anyway, just in case, but Monty was against that and eventually we have agreed that @@optimizer_switch
will have table_elimination=on|off
switch only in debug builds of the server.
How did you reason about the implications for “select for update” ?
LikeLike
Nice optimizer improvement – fwiw PostgreSQL 8.5alpha2 contains a very similiar planner improvement called “join removal” in the release notes:
http://developer.postgresql.org/pgdocs/postgres/release-8-5.html
LikeLike
Jonas,
The SQL layer will not touch any records in eliminated tables, so no locks will
be placed on any records from eliminated tables.
With regards to correctness of this, first I doubt that there is a clear
definition of what should happen with SELECT … FOR UPDATE with outer joins
(PostgreSQL would give error when one attempts to do this IIRC).
If we stretch the FOR UPDATE definition to cover outer joins, then I still
think current behavior is correct: one of the necessary conditions to
eliminate a table is that it must not be referred from the select list. I
interpret it as that we selected nothing from the eliminated table and hence
don’t need to place any locks.
SELECT .. FOR UPDATE won’t place any locks for tables inside subqueries,
including FROM-subqueries, why inner sides of outer joins that don’t get
to select list should be any better?
LikeLike
Stefan,
Thanks for the pointer. It would be interesting to compare the features.
LikeLike
Hi,
Maybe you remeber me, we talked about this feature after your presentation at FOSDEM.
Inspired by your presentation i have made a test with PostgreSQL. Our upcoming release 9.0 contains a similar feature, we called it join removal (as Stefan Kaltenbrunner said).
http://akretschmer.blogspot.com/2010/02/join-removal-short-performance-test.html
LikeLike
Hi Andreas,
Yes I remember our talk. Thanks for the the link and the performance test. One problem that I have with such testing is that the speedup number is somewhat arbitrary: one could make the eliminated table bigger by putting bigger TEXT data into it, and/or one could use an example where join removal would remove multiple tables.
I would argue that your example shows least possible speedup as the example removes only one table with relatively small columns đŸ™‚
LikeLike
About 2 minutes into the tutorial “8. Demonstrating Table Elimination” http://screenr.com/iQG we show a second kind of table elimination, dependent on foreign keys being declared in the attributes, and putting a condition on the attribute. This is demonstrated in SQL Server 2005/2008. Is this kind of table elimination available in MariaDB as well?
Read more, and watch more tutorials, on http://www.anchormodeling.com (and don’t forget to try our open source online modeling tool while you’re there).
LikeLike