Partition pruning tip: Use comparisons of column, not of partitioning function value

A customer issue has drawn my attention to this this pecularity: if partitioning is done by the value of some function, then partition pruning module will make use of comparisons of the partitioning column(s), but not of comparisons of the value of the partitioning function. Here is an example:

  PARTITION p0 VALUES LESS THAN ( TO_DAYS('2007-01-01') ),
  PARTITION p1 VALUES LESS THAN ( TO_DAYS('2007-02-01') ),
| id | select_type | table | partitions | type |
|  1 | SIMPLE      | t1    | p1         | ALL  |

| id | select_type | table | partitions     | type |
|  1 | simple      | t1    | p0,p1,p2,p3,p4 | all  |

This also applies to cases where partitioning function is a function of multiple columns. This gives us this tip:

For partition pruning to work, the WHERE clause must use comparisons of columns used by partitioning function, not of the result value of the partitioning function.

This doesn’t look like a totally natural limitation, but that is what is in effect for the time being.

The impact of this limitation and whether it can be lifted

The limitation can be easily worked around in cases where partitioning function is unary and monotonic, i.e where one can easily convert back and forth between comparison of partitioning function:

  monotonic_part_func(const1) < monotonic_part_func(part_col) < monotonic_part_func(const2)

and comparison of partitioning column:

const1 < part_col < const2

Partition pruning module can do such conversions, too, so it is not difficult to make MySQL handle such cases automatically. The case with non-monotonic or n-ary partitioning function is more difficult. Partition pruning uses range analysis over column ordering, that is, it analyzes the WHERE clause, converts it to a disjoint list of intervals like:

(a < part_col < b) OR (part_col = c) OR ...

and then finds which partitions those intervals fall into. We could switch to part_func(part_col) ordering, i.e. collect intervals like

(a < part_func(part_col) < b) OR (part_func(part_col) = c) OR ...

but this will make it hard to handle predicates like “a < part_col < b” (remember I’m talking about non-monotonic case, where X < Y does not mean that part_func(X) < part_func(Y)).

We could do two partition pruning passes, one with part_col ordering and the other with part_func(part_col) ordering, but that is slow, ugly, and will not handle cases like

part_col=const1 OR part_func(part_col)=const2

We could stop doing the range analysis altogether and operate on sets of used partitions, that is, recursively process the WHERE clause using a set of rules like this:

# AND/OR recursion
partitions(cond1 AND cond2) := intersect(partitions(cond1), partitions(cond2))
partitions(cond1 OR cond2) := union(partitions(cond1), partitions(cond2))

# Various atomic predicates
partitions(part_col = C) := { partition_no(part_func(C)) }
partitions(part_col < C) := ...

but that will be slow, especially when there are many partitions. Also this technique will work poorly (or get very complicated) in cases where partitioning function uses several columns, and comparisons of those columns are on different AND/OR branches in the WHERE clause.

Neither solution is prefect, and each of them adds some overhead. It seems we’ll need to collect some user input before we decide what (if anything) we should do.

1 Comment

  1. Kevin Burton says:

    Wow…. that’s a bit frightening actually… thanks for posting this.

    We’re just about to deploy some new code that does range datetime sorting and WHERE partition pruning.

    I don’t think this would have bitten us though.


Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s