A proposal for method of delivering optimizer bug fixes

Working on query optimizer bugs can be a rather frustrating experience. First, as soon as some query doesn’t run as fast it theoretically could people will consider it a bug. On one hand that’s great, you get a constant stream of user input, but on the other hand you end up with a whole pile of “bugs” which you can’t hope to finish.

What’s more frustrating is that even if you manage to create a fix for an optimizer bug, there are chances it won’t be allowed into next GA (currently 5.0.70) or approaching-GA (currently 5.1.30) release (GA is our term for “stable” or “release”).

The reason behind this is that most optimizer bugfixes cause the optimizer to pick different query plans, and there’s no way to guarantee that the fix will be a change for the better for absolutely everyone. Experience shows that it is possible to have a query that hits two optimizer bugs/deficiencies at once in such a way that they cancel each other out, and get problems when one of the bugs is fixed. A more common scenario is when the optimizer makes the best choice but it just doesn’t have all the info. The top five unknowns are

  • data distributions
  • correlations between data columns
  • correlations between data value and physical record order
  • highly selective conditions on non-indexed columns
  • hot/cold caches

Neither of those can be easily checked, so we’re very conservative and have the “no query plan changes in GA versions” rule.

The problem is that it turns out our GA releases aren’t very frequent and one may have to wait a looong time before the fix makes it into official GA release. Bazaar and its ease of creation of publicly-accessible branches have rectified the situation a bit but most users want a binary and also we don’t want to end up with the need to maintain 2^N branches after N optimizer bugs.

The proposal

This aims at query optimizer (“here’s a query which uses a non-optimal plan”-type) bugfixes that affect a small amount of code in small number of places.

  • We’ll put the fix into both GA and next-after-GA versions.
  • For next-after-GA version, just put the fix in, do not support the old behavior. That’s the only feasible long-term option, we can’t afford to support all behavior we’ve had at some point in the past.
  • For the GA version, make it possible to switch new behavior on and off. The default should be the old behavior (so we only put one “if” into the old execution path. Can one hope that *that* won’t break anything?).

The mechanism to turn on the new behavior will be server command line option, something like --with-bugfix=NNNN. It’s possible to ask to turn on multiple bugfixes by using the option several times:

mysqld --with-bugfix=37642  --with-bugfix=13356

or, in my.cnf syntax:

[mysqld]
...
with-bugfix=13356
with-bugfix=27432
...

The code of GA versions doesn’t change much, so it should be tolerable to have, say, twenty “if (bugfix_nnn) {…} else {…}” branches. mysqld binary should only know numbers of bugs which it has switchable fixes for. If it is invoked with –with-bugfix=N where N is not a bug number it knows, it should print issue a warning, something like this:

[Warning] This version doesn't have ability to switch fix BUG#NNNN , see
[Warning]   http://bugs.mysql.com/check-version.php?binary_version=X.Y.Z&bug=NNNN.

Visiting the printed URL gets you to the bugs database which has information about which fixes appeared in which versions, so it can tell you whether your binary already has the fix for BUG#NNNN integrated into the code or you need to upgrade, in which case it can tell you what is the first version that has the needed bugfix.

-end of proposal-

Any comments or feedback on this scheme are welcome. Their impact will be greater if they arrive till September, 17. We’re having a developer meeting on Sept, 17-24 and I’ll try to get this discussed and some decision made about this.

17 thoughts on “A proposal for method of delivering optimizer bug fixes

  1. That’s an interesting idea. Is there a risk of causing further issues by having a mix of bug fixes enabled or disabled?

    I would love to see the check-version.php file created so we can easily see if a bug is fixed in a specific version of MySQL.

    Like

  2. I am not sure I like this proposal. I think this will result in very confusing start up lines and big surprises.

    I think a better approach would be to version “generations of the optimizer”. Similar to in Oracle where for a while you had the choice between the rule based and the cost based optimizer. This way you guys could work on a truely new optimizer that cleans up with the past in a new major version, while keeping the old optimizer around for people that want to get new features but with the old behavior.

    Like

  3. This sounds like a messy idea. I understand your problems and frustration though.

    Perhaps refactoring of code is needed to let the query optimizer become a loadable module. You could then rebuild the optimizer according to your needs and swamp it in/out according to the bug fixed. Would speed up development not having to recompile the server each time too. Just a suggestion.

    Like

  4. If you can more the optimizer a plugin you get some nice extra benefits:

    You can provide users a experimental set of optimizer plugins that can be tested against a GA release. That way you get your feedback without waiting until the next GA and the majority of the users isn’t affected.

    Like

  5. Jan,

    The problem with pluginizing the optimizer a plugin is that there are simply too much interdependencies among its parts (for both good and bad reasons). I strongly doubt that it is possible to define interfaces that could be stable across minor releases and allow to introduce meaningful changes.

    What’s the problem with statically linked binary, anyway? The targeted bugfixes are small, size of the binary shouldn’t be an issue. Do we really doubt our coding so much that we can’t afford to put in an extra “if (bugfix_nnn_is_on) {…}” statement?

    (I think one place where we could use plugins is to delegate away predicate selectivity. Since we don’t have `extra statistics collection or predicate selectivity hints in our plan, we should let somebody else to do them)

    Like

  6. James,

    > This sounds like a messy idea.
    Keep in mind that this is targeting a small amount of localized fixes.

    > Perhaps refactoring of code is needed to let the query optimizer become a loadable module.
    See points raised in previous message, plus the reality is that it is definitely impossible to refactor 5.0 and 5.1 codebases at this point.

    > Would speed up development not having to recompile the server each time too.

    That’s actually not a big concern, as most of the optimizer is localized in a handful of files. It would be really fantastic if we got somebody in the optimizer team whose development was limited by the speed for server re-compilations 🙂

    Like

  7. Lukas,

    > I think a better approach would be to version “generations of the optimizer”. Similar to in Oracle where for a while you had the choice between the rule based and the cost based optimizer. This way you guys could work on a truely new optimizer that cleans up with the past in a new major version, while keeping the old optimizer around for people that want to get new features but with the old behavior.

    Yes, we already have that for new 6.0 code. It is possible turn on/off every single feature: subquery optimizations, index condition pushdown, Batched Key Acccess are all controlled by their respective @@server_variables. There is no uniformity/consistency in the variable names at the moment, but we have a worklog entry with a unified optimizer configuration scheme. This proposal was targeted specifically at 5.1 and 5.0.

    Like

  8. Eric,

    > That’s an interesting idea. Is there a risk of causing further issues by having a mix of bug fixes enabled or disabled?

    Theoretically, yes there is, but it should be a rare case. I expect such issues to be repeatable “poor query plan”-type problems, and not e.g. random crashes. In any case, by the time you’re playing with enabling fixes, you already have issues, so there’s not much to lose. And if a particular combination of fixes doesn’t work well, it will now be easy to move away from it.

    Like

  9. Sergey,

    Thanks for looking into the problem. Indeed rate of optimizer bugfixes in 5.0 and 5.1 is poor and you surely have the challenge because you can’t really validate your fixed on large variety of workloads until version is used by a lot of users but this is also the time when it becomes risky to change anything.

    –with-bugfix would be reasonable however I’m not sure how easy you would get it for all bug fixes – will not your turn your code to spaghetti causing development complication and potentially breaking some old code.

    When fixes are behavior changes some variable configuration could be even better because it would be possible to change behavior on per connection basics.

    Another important thing I should note – when you guys add any methods to optimizer make sure it is possible to FORCE using this method when it is possible even if Optimizer does not think it is good. For example forcing to use index merge instead of single index.

    Like

  10. This idea imho is will make the compatibility problems even worse. And the test suites even longer.
    I don’t think it’s wise to remedy organizational bottlenecks with technical fixes. The main problem here is the release cycle.
    Maybe the remedy is to “modularize” the server ? Like release certain components independently of others. Optimizer x.y.z, runtime a.b.c, etc.
    Note that it’s not strictly speaking necessary to have the components in separate binaries. It’s enough if people would know what to expect imho.

    Like

  11. It would be good to see optimizer fixes released quicker.. I’m wondering if instead of –with-bugfix=1234, it might be better to use –without-bugfix=12345.

    I believe new fixes like this should be opt-out, instead of opt-in.

    Like

  12. Sergey,

    I think
    –with-bugfix=12454
    it may be too criptic.

    Imagine my.cnf with

    [mysqld]

    with-bugfix=13445
    with-bugfix=13455
    with-bugfix=13645
    with-bugfix=13745

    May be you can add descriptive name ?

    like
    with-bugfix=”add index merge fix”

    etc ?

    Like

  13. Sergey my point was that I think its messy and confusing and a deployment nightmare to make every little bit an option here. In the end it seems that this puts the burden on the developer to choose the right mix. I rather prefer if we optimizer would be provided in generations, where older generations are still available as one package in later versions (to be phased out eventually of course).

    Like

  14. Peter,

    > –with-bugfix would be reasonable however I’m not sure how easy you would get it for all bug fixes – will not your turn your code to spaghetti causing development complication and potentially breaking some old code.

    We are not going to add this for every fix. We’ll add it only 1) for query speed bugs (fixes for crashes will be on unconditionally) and 2) when that was requested by a paying customer or when the fix is assumed to be important enough for big set of people.

    > When fixes are behavior changes some variable configuration could be even better because it would be possible to change behavior on per connection basics.

    Yes.. perhaps a standard system/user variable would be better. Got to decide whether we need one variable for all or separate variables for each fix.

    > Another important thing I should note – when you guys add any methods to optimizer make sure it is possible to FORCE using this method when it is possible even if Optimizer does not think it is good. For example forcing to use index merge instead of single index.

    It would be an interesting challenge for e.g. subquery optimizations.. it’s fairly straightforward to force materialization but there’s no obvious way how one could specify that other strategies should [not] be used. Ok, we’ll take another look at this.

    Like

  15. Mark,

    > When will we get optimizer tracing? That would make it much easier to diagnose these problems.

    ATM we don’t have any plans we’ve committed to but better diagnostics is high in our list of desired features.

    Like

Leave a comment