Optimizer-related fixes in current batch of stable releases

(EDIT: added also MDEV-32203)

MariaDB released has made a batch of releases in the stable series. I wanted to list notable query optimizer improvements there, together with some comments.

MDEV-29693: ANALYZE TABLE still flushes table definition cache when engine-independent statistics is used

If one has a long running query and at the same time runs ANALYZE TABLE to collect EITS statistics (think histograms) in parallel with it, this would cause all new queries to wait until the long running query finishes.

Why was it so? In-memory representation of column histograms (and other EITS statistics) was shared between all threads. It didn’t look safe to replace the statistics while some other threads could be using them. The solution was: whenever ANALYZE TABLE had collected new statistics, it forced the old statistics out. This meant stopping any new queries and waiting for currently running queries to finish. Then, new statistics were loaded and new queries unblocked.

The new, more performant solution is to allow different threads to use different “versions” of statistics objects. Looking back, I can’t help wondering why MariaDB provides MVCC to its users
but has sometimes to do roll-our-own-baby-MVCC internally.

MDEV-31303: Key not used when IN clause has both signed and unsigned values

The optimizer couldn’t handle conditions like

unsigned_key_value IN ( small_constant, big_constant, ...)

where big_constant has the upper bit set (i.e. is greater than 2^(datatype_bits-1)) and small_constant doesn’t. The cause was that semantic analysis treated small_constant as signed integer and big_constant as unsigned. Non-uniform treatment made the predicate non-sargable. This limitation is now lifted.

MDEV-32113: utf8mb3_key_col=utf8mb4_value cannot be used for ref access

I’ve covered this in my previous post: Making “tbl.utf8mb3_key_column=utf8mb4_expr” sargable.

MDEV-32351: Significant slowdown for query with many outer joins

This was a quite complicated bug involving LEFT JOINs, join buffer and ON expressions depending only on the left join’s outer table. Apparently a query pattern like this:

t1 LEFT JOIN t2 ON ( condition_thats_rarely_true(t1.columns) AND ... )

is used by real-world queries (Magento), and MariaDB’s Block-Nested-Loops join even had optimization for such cases. This optimization was accidentally disabled by another bugfix a few releases ago.
Alas, there was no test coverage for it, so it wasn’t detected. Now, we do have test coverage.

MDEV-32475: Skip sorting if we will read one row

This came from a customer case. Consider a subquery in form:

  SELECT 
    (SELECT value
     FROM t1
     WHERE 
       t1.pk1=outer_table.col1 AND t1.pk2=outer_table.col2
     ORDER BY t1.col LIMIT 1
    ) 
  FROM
    outer_table ...

Table t1 has PRIMARY KEY(pk1, pk2). The subquery’s WHERE condition covers the whole Primary Key, so we know we will need to read just one row. However, the optimizer failed to detect that and tried to find the best way to handle ORDER BY … LIMIT. Here, it would manage to make a mistake and pick some index other than PRIMARY KEY. Reading that index produced records in ORDER BY clause order, but took much longer.

It’s risky to touch ORDER BY... LIMIT optimization code in stable versions, so we didn’t do that. What we could do was to make the optimizer detect that there is pk_part_n=outer_ref for every part of the Primary Key, and skip ORDER BY...LIMIT handling altogether. Outer_ref here is important as pk_part_n=constant were handled. Outer_ref is not a constant but it still “binds” the primary key column and makes sorting unnecessary. Btw, it turned out MySQL has the same kludge. Does this mean their ORDER BY … LIMIT optimization code has the same kind of gap?

MDEV-32203: Raise notes when an index cannot be used on data type mismatch

It’s surprising that this is done only now in 2023. The issue is ages old. One can use an incorrect column datatype and/or collation, for example store integers in a VARCHAR column:

create table t1 (
  varchar_col varchar(100);
  ...
);
...
select * from t1 where t1.varchar_col=123;

This looks a rookie mistake but we saw it happen by accident at paying customers, too. Selection queries produce the intended results most of the time (or even all the time, depending on your data). But under the hood, the comparison is done using double datatype. If one has added an INDEX(varchar_col), the optimizer won’t be able to use it, as it would have to search for all possible ways one could write the number 123: ‘123.0’, ‘123.000’, ‘+123’ and so forth. The EXPLAIN will show that index isn’t used:

MariaDB [test]> explain select * from t1 where t1.varchar_col=123;
+------+-------------+-------+------+-----------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys   | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+-----------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | ALL  | varchar_col_idx | NULL | NULL    | NULL | 1000 | Using where |
+------+-------------+-------+------+-----------------+------+---------+------+------+-------------+

After the fix for MDEV-32203 the EXPLAIN will produce a warning:

MariaDB [test]> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1105
Message: Cannot use key `varchar_col_idx` part[0] for lookup: `test`.`t1`.`varchar_col` of type `varchar` = "123" of type `int`

Non-EXPLAIN statements won’t produce warnings by default (compatibility!), but one make them produce warnings as well using the new note_verbosity variable. Also, it’s possible to get these warnings in the Slow Query Log by adjusting log_slow_verbosity system variable.

The fix covers equalities between columns as well: t1.varchar_col=t2.int_col will produce a similar warning. The fix for some reason doesn’t cover IN clauses, varchar_col IN (1,2,3)is not handled. I’m going to check if this was intentional.

Leave a comment