UCASE() is now sargable in MariaDB. LCASE() can’t be.

Short: UCASE(key_col)=… is now sargable in MariaDB. One can ignore the UCASE() call here, but only for a few collations. One might expect this to work for any _ci (case-insensitive) collation, but it doesn’t. Surprisingly, LCASE() cannot be ignored even in utf8_general_ci.

Long: Some months ago, we got a set of customer queries that had comparisons like

UCASE(tbl1.key_col) = UCASE(tbl2.key_col)

The optimizer couldn’t process this condition at all, so it was computing a cross join. Also, it wasn’t able to estimate the condition selectivity, so the rest of the query plan was bad, too.

But, tbl1.key_col’s collation used a case-insensitive collation utf8mb3_general_ci, which gave an idea, perhaps

UPPER(tbl1.key_col)=something is equivalent to tbl1.key_col=something

After investigation with Alexander Barkov, we’ve discovered that this holds for utf8mb3_general_ci collation. But other case-insensitive collations have exceptions to this. For example, in Czech collation utf8mb3_czech_ci, 'CH' compares as if it was a single letter between ‘H’ and ‘I’. But 'cH' is not treated this way, which means UCASE('cH') will compare differently from 'cH'. MDEV-31494 has more details and more examples.

One would think LCASE() would be the same. It is not. Even in utf8mb3_general_ci, there are characters that compare differently from their LCASE(). The first example is the Angstrom Sign. It compares as different from letter A:

MariaDB [test]> select 'A'='Å' ;
+-----------+
| 'A'='Å'   |
+-----------+
|         0 |
+-----------+

The LCASE of Angstrom Sign is “Latin Small Letter A with Ring Above”, which compares as equal to letter a:

MariaDB [test]> select LCASE('A'), LCASE('Å'), LCASE('A')=LCASE('Å') ;
+------------+--------------+-------------------------+
| LCASE('A') | LCASE('Å')   | LCASE('A')=LCASE('Å')   |
+------------+--------------+-------------------------+
| a          | å            |                       1 |
+------------+--------------+-------------------------+

There are other similar examples. MDEV-31955 has the details.

Note that here LCASE(x)=... is true, and x= is not. If we do searches using x=... , we will miss rows for which LCASE(x)=... , so the rewrite cannot be done.

Leave a comment