MySQL Optimizer and Prepared Statements

Performance-wise, the idea of Prepared Statements is that the server does certain pre-processing on PREPARE command, and then those steps are omitted when the statement is executed. If the statement is executed several times, we get an economy of

cost_of_prepare_preprocessing * (#statement_executions – 1)

This makes one want to move the CPU and IO-intensive query optimization into the PREPARE phase. Unfortunately, this will make the optimizer work much worse – optimizer’s decisions are based on the external information, and there is much less available information at PREPARE phase. The most crucial differences are that

  • The values of the ‘?’ parameter markers are not yet known
  • The results of probes done in the the queried tables cannot be relied on because the table data may change before the EXECUTE
  • [less important] Table and index statistics may change before the EXECUTE

Those limitations cause the most of MySQL’s optimizations to be re-done at every EXECUTE command. To be precise, here is a table of what is done when:

Action When
Query parse PREPARE
Negation elimination PREPARE
Subquery re-writes PREPARE
Nested JOIN simplification First EXECUTE
OUTER->INNER JOIN conversions First EXECUTE
Partition pruning Every EXECUTE
COUNT/MIN/MAX elimination Every EXECUTE
Constant subexpression removal Every EXECUTE
Equality propagation Every EXECUTE
Constant table detection Every EXECUTE
ref access analysis Every EXECUTE
range/index_merge analysis and optimization Every EXECUTE
Join optimization Every EXECUTE

Basically, the current rule is as follows:

Optimizations that depend only on query syntax are performed either at PREPARE or at first EXECUTE. Optimizations that also depend on something else are performed at every EXECUTE.

If you’re only interested in the current state of affairs, that’s all you need to know. If you want to know what [I think] is going happen in the future, read on.

The future

The above quoted rule is not a DBMS law of nature. In fact, it is already getting in the way of subquery rewrites (see e.g. BUG#27430, WL#3740) and some other server subsystems. We’ve recently had a discussion about other possible solutions. The smartest and most elegant solution was suggested by Timour:

  • Let every optimization record its “dependencies” – information that it depends on:
    • ref access analyzer: set of table indexes
    • constant table detector: the above + the set of tables that have 1 or 0 rows
    • range optimizer: all of the above + used values of parameter markers(if any) + results of records_in_range calls
    • … and so forth…
  • On subsequent EXECUTE, check which dependencies have changed and re-do the correponding optimizations.

This approach will reduce the work done on EXECUTE step to doing a set of checks. In most cases, the checks will pass, and no re-optimization will be required. This approach is hard to implement, however. MySQL’s internal data structures do not naturally support “block undo”, so it is not easy to cancel and re-do an optimization.

For now we’ve settled for a less impressive solution:

  • We’ll gradually move query rewrites that depend on table DDLs into PREPARE phase
  • On EXECUTE we’ll check if DDLs of the used tables have changed. If they have, we’ll throw out the Prepared Statement and re-prepare it.

Re-preparing the statement is expensive but it should be a very rare occurence for most applications. This task is filed as WL#3813: Re-prepare the prepared statement if its underlying objects have changed.

6 Comments

  1. Lukas says:

    A few points:

    1) many people are using prepared statements as a way to protect against SQL injection. in this light they are often using prepared statements for single shot queries
    2) as you point out, different data will lead to different optimal query plans, however you also do not want to replan the query everytime you get different data

    As such it seems to me you would want to do a couple of things:
    1) give the user a way to decide if he wants a general plan or one that is optimized to the data set in the very first execute
    2) ideally go in the direction you mentioned above and store multiple execution plan’s as the user is executing statements with data sets that require different execution plans

    While 2) is fairly hard to do, 1) should be fairly “easy”. PostgreSQL provides this due to their different handling of named and unnamed prepared statements. IIRC Oracle since 9i will always defer execution planning until the very first execute.

    Like

  2. sergey says:

    > many people are using prepared statements as a way to
    > protect against SQL injection. in this light they are
    > often using prepared statements for single shot queries

    Perhaps this case should be handled by using “client side” Prepared
    Statements – handling the PS on the client will spare one
    clientserver roundtrip.

    > 1) give the user a way to decide if he wants a general plan
    > or one that is optimized to the data set in the very first
    > execute

    The problem is MySQL’s Query Plans may stronly depend on properties like

    ‘table X has exactly one row with values (X,Y,Z)’

    so the query plan from 1st EXECUTE may be not valid for the second EXECUTE.
    We will need dependency checker to handle cases like this.

    When this is handled and there are no QEP-destroying changes, we will have
    the choice between re-running the [expensive] range/join optimization or
    re-using their results from the previous execution. This decision can be
    made based on user input (I’d prefer explicit hints over named/unnamed PS).

    (if we allow to re-use optimization we’ll need to make the join/range/ref
    optimizers to treat the ‘?’ markers as “unknown constants” though)

    Thanks also for info on Postgres and Oracle.

    I’ll amend the WL text based on this info and ideas.

    Like

  3. Rick James says:

    I like the approach mentioned in Comment 2. From my experience, I would say that well over 90% of QEPs are identical, regardless of the constants supplied. I conclude that the “hang onto prepare” hint would be ‘the right thing’ most of the time.

    This observation comes from monitoring the slowlogs on dozens of unrelated MySQL servers, collecting them over time, and comparing the latest 50 EXPLAINs (or at least running EXPLAIN all at once on the various SELECTs). Rarely do the QEPs differ by anything substantive.

    OTOH, I have seen some statements come up with as many as 4 different QEPs. Sometimes the diff is in picking INDEX (a,b) vs INDEX (a,c) when all it wants it to lookup by “a”. Question: Might the optimizer pick a different INDEX based on what index blocks happen to be cached at the moment?

    Like

  4. Stan Vassilev says:

    >Perhaps this case should be handled by using
    >“client side” Prepared
    >Statements – handling the PS on the client will
    >spare one
    >clientserver roundtrip.

    What he means is that prepared statements protects from exploits which can’t be solved on the client side. MySQL’s escaping will not detect charset/collation changes at “runtime”, and continue to escape for the charset/collation the connection opened at.

    Prepared statements dont’ suffer from this as a binary format.

    Best of both worlds would be a query format which includes a statement part, and then a binary data part, all delivered as a single message in a single roundtrip.

    Like

  5. Pingback: please click the following webpage

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s