Notable optimizer fixes released in February, 2024

MariaDB has made new releases of stable versions in February. Like with previous release batch, I want to list notable changes in the query optimizer:

MDEV-32958: Unusable key notes do not get reported for some operations

This is a follow-up to MDEV-32203 I’ve covered for the previous release: MariaDB now emits a warning for conditions in form indexed_column CMP_OP const that are unusable for the optimizer. The most common case where they are not usable is varchar_column=INTEGER_CONSTANT but there are less obvious cases as well, like mismatched collations.

The original patch failed to produce the warning in some some cases. Now, this is fixed.

Writing varchar_col=INTEGER_CONSTANT looks like a newbie mistake, but it is not. I’ve encountered several such cases in the last couple of months alone. They were in fairly complex and well-written queries. MDEV-32203 was a good idea.

MDEV-33118: Add optimizer_adjust_secondary_key_costs variable

This is added to address poor join query plans. Consider a query plan using ref access:

select * from t1, t2 where t2.key1=t1.col1 and t2.key2='foo'
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref     | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL    | 1000 | Using where |
|    1 | SIMPLE      | t2    | ref  | key1          | key1 | 5       | t1.col1 | 200  | Using where |
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------+

When computing cost of reading table t2 by doing index lookups using t2.key1=t1.col1, MariaDB tried to take into account that some of the reads would hit the cache. Basically, the total cost of all lookups was capped by “worst_seeks” value which was a function of how much we would read from table t2 if we read its matching rows “independently” of table t1.
However this cap didn’t apply for all possible ref accesses. ref accesses that have a constant key part (like t2.key2='foo' in this example) “borrowed” #rows and cost estimate from the range optimizer, and that number was not capped.

This resulted in very poor query plan choices in some scenarios. The visible effect was that the optimizer picked an obviously bad ref access plan when a better option was clearly present.

Another related issue was the relative costs of clustered index scans and secondary index scans. Secondary index scans cost was too low.

Both of these issues are fixed in The Big Cost Model Rewrite in MariaDB 11.0. But if one can’t upgrade to 11.0 yet, they can get these fixes in MariaDB 10.6+ by setting optimizer_adjust_secondary_key_costs accordingly.

MDEV-32148: Inefficient WHERE timestamp_column = datetime_const_expr

MariaDB (and MySQL) has two datatypes for storing points in time: TIMESTAMP and DATETIME.

DATETIME is “YYYY-MM-DD HH:MM:SS” value, without specifying which time zone it is in.

TIMESTAMP is a point in time. It is “the number of [micro]seconds since midnight January 1st, 1970 GMT”. When you read a TIMESTAMP column, it is converted to ‘YYYY-MM-DD …’ datetime in your local @@session.time_zone.

Consider Query-1 which compares a timestamp column with a datetime literal:

SELECT ... FROM tbl WHERE timestamp_column <= 'YYYY-MM-DD ...'

Here, for each row considered, MariaDB would convert the value of timestamp_column into a DATETIME structure consisting of Year, Month, Date, …. and then compare it with the DATETIME structure representing ‘YYYY-MM-DD…’. But if TIMESTAMPs are just integers, why not compare as TIMESTAMPs instead?

This is surprisingly complex. First, DATETIME values have a wider range: they span from year 0000 to 9999 while TIMESTAMP covers only 1970 to 2038. Second, DST time changes mean that some DATETIME values map to two possible TIMESTAMP points-in-time: one before the clock is moved backwards, and one after. When the clock is moved forward, there are DATETIME values that do not map to any TIMESTAMP.

The patch for MDEV-32148 carefully takes all these limitations into account and makes queries like Query-1 use TIMESTAMP comparisons whenever it’s safe.

Leave a comment