Performance-wise, the idea of Prepared Statements is that the server does certain pre-processing on PREPARE command, and then those steps are omitted when the statement is executed. If the statement is executed several times, we get an economy of
cost_of_prepare_preprocessing * (#statement_executions – 1)
This makes one want to move the CPU and IO-intensive query optimization into the PREPARE phase. Unfortunately, this will make the optimizer work much worse – optimizer’s decisions are based on the external information, and there is much less available information at PREPARE phase. The most crucial differences are that
- The values of the ‘?’ parameter markers are not yet known
- The results of probes done in the the queried tables cannot be relied on because the table data may change before the EXECUTE
- [less important] Table and index statistics may change before the EXECUTE
Those limitations cause the most of MySQL’s optimizations to be re-done at every EXECUTE command. To be precise, here is a table of what is done when:
Action | When |
---|---|
Query parse | PREPARE |
Negation elimination | PREPARE |
Subquery re-writes | PREPARE |
Nested JOIN simplification | First EXECUTE |
OUTER->INNER JOIN conversions | First EXECUTE |
Partition pruning | Every EXECUTE |
COUNT/MIN/MAX elimination | Every EXECUTE |
Constant subexpression removal | Every EXECUTE |
Equality propagation | Every EXECUTE |
Constant table detection | Every EXECUTE |
ref access analysis | Every EXECUTE |
range/index_merge analysis and optimization | Every EXECUTE |
Join optimization | Every EXECUTE |
Basically, the current rule is as follows:
Optimizations that depend only on query syntax are performed either at PREPARE or at first EXECUTE. Optimizations that also depend on something else are performed at every EXECUTE.
If you’re only interested in the current state of affairs, that’s all you need to know. If you want to know what [I think] is going happen in the future, read on.
The future
The above quoted rule is not a DBMS law of nature. In fact, it is already getting in the way of subquery rewrites (see e.g. BUG#27430, WL#3740) and some other server subsystems. We’ve recently had a discussion about other possible solutions. The smartest and most elegant solution was suggested by Timour:
- Let every optimization record its “dependencies” – information that it depends on:
- ref access analyzer: set of table indexes
- constant table detector: the above + the set of tables that have 1 or 0 rows
- range optimizer: all of the above + used values of parameter markers(if any) + results of records_in_range calls
- … and so forth…
- On subsequent EXECUTE, check which dependencies have changed and re-do the correponding optimizations.
This approach will reduce the work done on EXECUTE step to doing a set of checks. In most cases, the checks will pass, and no re-optimization will be required. This approach is hard to implement, however. MySQL’s internal data structures do not naturally support “block undo”, so it is not easy to cancel and re-do an optimization.
For now we’ve settled for a less impressive solution:
- We’ll gradually move query rewrites that depend on table DDLs into PREPARE phase
- On EXECUTE we’ll check if DDLs of the used tables have changed. If they have, we’ll throw out the Prepared Statement and re-prepare it.
Re-preparing the statement is expensive but it should be a very rare occurence for most applications. This task is filed as WL#3813: Re-prepare the prepared statement if its underlying objects have changed.
A few points:
1) many people are using prepared statements as a way to protect against SQL injection. in this light they are often using prepared statements for single shot queries
2) as you point out, different data will lead to different optimal query plans, however you also do not want to replan the query everytime you get different data
As such it seems to me you would want to do a couple of things:
1) give the user a way to decide if he wants a general plan or one that is optimized to the data set in the very first execute
2) ideally go in the direction you mentioned above and store multiple execution plan’s as the user is executing statements with data sets that require different execution plans
While 2) is fairly hard to do, 1) should be fairly “easy”. PostgreSQL provides this due to their different handling of named and unnamed prepared statements. IIRC Oracle since 9i will always defer execution planning until the very first execute.
LikeLike
> many people are using prepared statements as a way to
> protect against SQL injection. in this light they are
> often using prepared statements for single shot queries
Perhaps this case should be handled by using “client side” Prepared
Statements – handling the PS on the client will spare one
clientserver roundtrip.
> 1) give the user a way to decide if he wants a general plan
> or one that is optimized to the data set in the very first
> execute
The problem is MySQL’s Query Plans may stronly depend on properties like
‘table X has exactly one row with values (X,Y,Z)’
so the query plan from 1st EXECUTE may be not valid for the second EXECUTE.
We will need dependency checker to handle cases like this.
When this is handled and there are no QEP-destroying changes, we will have
the choice between re-running the [expensive] range/join optimization or
re-using their results from the previous execution. This decision can be
made based on user input (I’d prefer explicit hints over named/unnamed PS).
(if we allow to re-use optimization we’ll need to make the join/range/ref
optimizers to treat the ‘?’ markers as “unknown constants” though)
Thanks also for info on Postgres and Oracle.
I’ll amend the WL text based on this info and ideas.
LikeLike
I like the approach mentioned in Comment 2. From my experience, I would say that well over 90% of QEPs are identical, regardless of the constants supplied. I conclude that the “hang onto prepare” hint would be ‘the right thing’ most of the time.
This observation comes from monitoring the slowlogs on dozens of unrelated MySQL servers, collecting them over time, and comparing the latest 50 EXPLAINs (or at least running EXPLAIN all at once on the various SELECTs). Rarely do the QEPs differ by anything substantive.
OTOH, I have seen some statements come up with as many as 4 different QEPs. Sometimes the diff is in picking INDEX (a,b) vs INDEX (a,c) when all it wants it to lookup by “a”. Question: Might the optimizer pick a different INDEX based on what index blocks happen to be cached at the moment?
LikeLike
>Perhaps this case should be handled by using
>“client side†Prepared
>Statements – handling the PS on the client will
>spare one
>clientserver roundtrip.
What he means is that prepared statements protects from exploits which can’t be solved on the client side. MySQL’s escaping will not detect charset/collation changes at “runtime”, and continue to escape for the charset/collation the connection opened at.
Prepared statements dont’ suffer from this as a binary format.
Best of both worlds would be a query format which includes a statement part, and then a binary data part, all delivered as a single message in a single roundtrip.
LikeLike