Short: UCASE(key_col)=…
is now sargable in MariaDB. One can ignore the UCASE()
call here, but only for a few collations. One might expect this to work for any _ci (case-insensitive) collation, but it doesn’t. Surprisingly, LCASE()
cannot be ignored even in utf8_general_ci
.
Long: Some months ago, we got a set of customer queries that had comparisons like
UCASE(tbl1.key_col) = UCASE(tbl2.key_col)
The optimizer couldn’t process this condition at all, so it was computing a cross join. Also, it wasn’t able to estimate the condition selectivity, so the rest of the query plan was bad, too.
But, tbl1.key_col’s collation used a case-insensitive collation utf8mb3_general_ci
, which gave an idea, perhaps
UPPER(tbl1.key_col)=something
is equivalent to tbl1.key_col=something
After investigation with Alexander Barkov, we’ve discovered that this holds for utf8mb3_general_ci
collation. But other case-insensitive collations have exceptions to this. For example, in Czech collation utf8mb3_czech_ci
, 'CH'
compares as if it was a single letter between ‘H’ and ‘I’. But 'cH'
is not treated this way, which means UCASE('cH')
will compare differently from 'cH'.
MDEV-31494 has more details and more examples.
One would think LCASE() would be the same. It is not. Even in utf8mb3_general_ci
, there are characters that compare differently from their LCASE(). The first example is the Angstrom Sign. It compares as different from letter A:
MariaDB [test]> select 'A'='Å' ;
+-----------+
| 'A'='Å' |
+-----------+
| 0 |
+-----------+
The LCASE of Angstrom Sign is “Latin Small Letter A with Ring Above”, which compares as equal to letter a:
MariaDB [test]> select LCASE('A'), LCASE('Å'), LCASE('A')=LCASE('Å') ;
+------------+--------------+-------------------------+
| LCASE('A') | LCASE('Å') | LCASE('A')=LCASE('Å') |
+------------+--------------+-------------------------+
| a | å | 1 |
+------------+--------------+-------------------------+
There are other similar examples. MDEV-31955 has the details.
Note that here LCASE(x)=...
is true, and x=
is not. If we do searches using x=..
. , we will miss rows for which LCASE(x)=...
, so the rewrite cannot be done.