Quarterly SingingSQL Newsletter #2
If
you wish to unsubscribe at any time, just drop me a line at dantow@singingsql.com. I’ll likely eventually
get around to setting up a proper email group, through some third party, so
that eventually people can subscribe and unsubscribe without communicating with
me, but this is simplest for a start, I think.
I’m
taking a lowest-common-denominator approach to the format of this newsletter,
sending it as a straight-ASCII note so anyone at all can access it securely,
but if you’d rather read something more formatted, go to OnlineNewsletter02. I’ll
also have links scattered throughout for supplementary materials.
Although
this material is copyrighted, feel free to forward this newsletter, as is, to
any friends who you think might be interested.
If
you have any comments, ideas for future newsletter content, or suggestions,
feel free to drop me a line, at dantow@singingsql.com
– I’m always happy to hear from you!
To see a cumulative listing
of all news, including old news from past newsletters, see all news.
To see a cumulative listing
of all links, including links already listed in past newsletters, see all links.
Fixing Broken SQL: In this presentation
to Hotsos, I presented material with the following abstract: Most developers spend most of their time maintaining
code they did not originate. Frequently, developers must fix SQL without
knowing its business purpose; this is especially true of performance
specialists, who often work on SQL for applications, or parts of applications,
that are unfamiliar. Surprisingly, there are patterns in most incorrect SQL
that you can learn to recognize and repair even without knowing the purpose of
the SQL. This presentation reviews a breakdown of the most common recognizable
patterns in broken SQL, and teaches how to find and fix what’s broken with
minimal information apart from the bare SQL. The resulting repairs frequently
enable better performance, usually fix corner-case functional defects, and
invariably make the SQL clearer and easier to maintain. For tuning specialists,
an understanding of these patterns enables an unexpected bonus service—while
tuning someone else’s SQL, you also find and fix a subtle functional problem
they did not even know existed.
I recently ran into this
problem on a major production system, and I suspect it is hurting many other
systems, so I offer this as a short object lesson. It may be primarily an issue
on Oracle, since Oracle makes it so easy and tempting to fall into this trap.
(Question to my readers who use databases other than Oracle: Does your database
support table-rebuild syntax that would make this trap all-too-easy to fall
into, too? I’ll publish your answer and cite your name (if you wish).) However,
even on non-Oracle databases that do not make it so easy to fall into this
trap, you could still have this problem if you engineered a parallel table
rebuild manually.
Here’s the scenario: Users
reported a serious drop in performance for a module. I’d already been running
automated collection of “V$” data from Oracle to see the breakdown of runtime
for all modules according to SQL statement, and to capture the actual execution
plans seen at the time by those SQL
statements. Rapid analysis of this already-available data demonstrated that one
particular statement was responsible for the performance degradation (creating
much more load after the reported problem than before), but that statement was
using the same plan as before the problem occurred, and reading the same number
of rows as before. The extra runtime was accounted for (again, according to my
already-collected data) by a much poorer hit ratio for the query (by a greater
number and cumulative duration of physical I/Os, that is, while seeing the same
number of logical I/Os as seen before the problem). By “poorer hit ratio” (if
you aren’t familiar with the term “hit ratio”) I mean that a larger fraction
(compared to before the problem) of the block reads that the query needed were
going straight to physical disk, because they were not in the block buffer
cache maintained for recently-read blocks in Oracle. (All databases cache most
recently read blocks, so this is not anything special to Oracle.) I shortly
learned that a large table involved in the query had been rebuilt by database
administration, in order to help
performance, yet it turned out that the rebuild was precisely what had killed performance, of this query and,
as it turned out, of many other queries!
Here is the all-too-tempting
trap that Oracle offers anyone who is rebuilding a large table:
A common way to rebuild a
table is to execute a statement such as
CREATE TABLE
(This would be followed,
normally, by dropping OLD_TABLE (or maybe renaming it to something like
OLD_TABLE_BACKUP, if you’re being conservative), then renaming
Where the table being
rebuilt is really large (and really large tables are usually the only ones that
need to be rebuilt), Oracle has helpfully provided a super-simple way to speed
this up, however:
CREATE TABLE
Wow, just by adding
“PARALLEL 10”, you get a roughly tenfold speedup of the table-copying
statement! (The PARALLEL clause specifies that table-creation happens with that
degree of parallelism, if possible – multiple engines work on the rebuild
simultaneously. “10” is just used as an example, here – you can choose any
degree of parallelism that you want, within reason.) Now, a smart DBA will also
read the fine print, and learn that the PARALLEL clause also specifies that future queries against this table will try to
run with the same specified degree of parallelism, if it would help. Since that
is frequently not a good idea, and since the smart DBA is, after all, just
looking to speed up the rebuild, and is not looking to destabilize future
execution plans, the smart DBA (assuming that the former PARALLEL setting for
the table was “1” will follow up the rebuild with:
ALTER TABLE
This speedup of the rebuild
very understandably seemed like an obviously good thing, so the DBAs took full
advantage of it. Since my client’s DBAs are excellent, they had no trouble
getting this right, and correctly reset the table parallelism to NOPARALLEL
after the table rebuild, since we have no need for default parallelism on the
database (all SQL being so fast without it that parallelism isn’t needed at
all).
Normally, the result of a
table rebuild will be a more-compact table and indexes, and elimination of
chaining (rows that don’t fit in the same block they started in, as a result of
updates making rows longer after they are created), and this would lead
everyone to expect better caching
after the rebuild than before, so what happened?!? (Think about it for a minute
before reading on, if you want an exercise.)
Here’s what happened, under
the covers: Oracle has no more-efficient way to read rows quickly in an engine
(given that it needs to eventually read all
the rows) than a full table scan. If it is going to spread the load over 10
engines, however, there is no point in every engine reading the whole table.
The trick then, is to assign sub-parts of the full table scan to each engine.
If the table is 100000 blocks long, Oracle will assign the first engine to do a
sort of sub-table-scan of the first 10000 blocks (a sort of full table scan
that stops at the 10000th block, that is), and the second engine
will get a sub-table-scan of the second 10000 blocks (a scan that starts at the
10001st block, and ends at the 20000th block, that is),
and so forth. All of these engines
will be inserting the rows that they
read, as they read them, at the same time, to the top block of the new table
being built, however! If you used “PARALLEL 2”, this would look (treating each
row as a “card”) like shuffling a deck of cards once (an analogy particularly
for my poker-playing readers). A ten-way shuffle is harder to visualize, but
that’s what you get with “PARALLEL 10”.
(Use of freelists might
help, here. Freelists could mean that each of the parallel engines would insert
into different blocks, which would at least avoid shuffling rows within any
given blocks, a major improvement. However, there would still be a shuffling
effect, because blocks that used to be close together would end up far apart.
Since most disk subsystems do “read ahead” that pre-loads nearby disk blocks
into a disk-subsystem cache (which is distinct from the database cache),
“physical” I/Os (from the perspective of the database) to nearby blocks tend to
be satisfied from the disk-subsystem cache (meaning that they are not really physical, at the disk-subsystem
level), often, where nearby blocks are somehow related to each other (for
example, representing rows that originated almost at the same time), so the
average time to perform a “physical” I/O will be lower when this disk-subsystem
cache is effective. The disk-subsystem cache of read-ahead blocks will be much
less effective, however, if blocks end up shuffled willy-nilly.)
Now, there should never be a
functional issue with shuffling rows in a table – one of the fundamental tenets
of relational database theory is that the database is not obligated to store
rows in any particular order, physically, and is only obligated to return rows
in a particular order if you explicitly call for that with an ORDER BY clause.
Unfortunately, another fundamental tenet of relation database theory is that
the database makes no promise of how long it will take to return the rows, so
once we consider performance, we have to step outside the theoretical box.
In the real world, then,
almost all large tables have rows in the table that are needed frequently, and
rows that are rarely needed. There is a strong tendency for the
most-often-needed rows to be created at close to the same time. For example, in
most tables that track business events (orders, payments, account entries,
customer events, et cetera), the most-recent rows are much more frequently
queried than older rows. In a few cases, such as a customers table, the oldest
rows tend to represent the most-important entities, but few large tables are
accessed in such a way that users read old rows with the same frequency as new
rows. Normal, heap-type tables tend to place new rows all together, usually at
the top of the range of blocks storing each table, so there is a natural
tendency to cluster “hot” rows, the rows that we need most frequently. This is
very useful when the database caches blocks, because once some query (possibly
our own, but potentially someone else’s) reads a row, the cached block holding
that row is very likely to have other useful rows that will be accessed before
the blocks leaves the cache. The more hot rows tend to clump together, the
fewer blocks we need to cache to cache all (or nearly all) the hot rows.
If all the hot rows are
stored together at the top of the table structure, though, and we then shuffle the table by copying it in a
parallel rebuild, the new copy will be much more poorly clustered, for example,
scattering the former top-tenth of the table throughout the new copy, where the
copy is done with “PARALLEL 10”! With poorer clustering, a given-sized cache
will then deliver a much poorer hit ratio, and this is just what happened in my
scenario – the table rebuild had destroyed the natural clustering of the hot
rows, and the hit ratio for that table had dropped through the floor, as a
result.
As a result, I can see few
scenarios where a parallel table rebuild of an ordinary heap table would be a
good idea in the long run, and I recommend extreme caution if you contemplate
doing this. (If you are rebuilding a single table, you can probably take the
time to do it serially. If you are rebuilding many tables together, you can
probably achieve the parallelism that you need by using multiple parallel
scripts that each rebuild different tables at the same time, but that rebuild each individual
table serially. If you have such a huge table that a serial rebuild would be a
problem for just the one table, then you probably should make that table a
partitioned table, so that each individual partition can be rebuilt serially,
but the several partitions can be rebuilt in parallel, if necessary.)
I’d like to describe a class
of SQL tuning problems that I glossed over, somewhat, in SQL Tuning. A fraction of
problems in most business applications, and a somewhat larger fraction of
problems encountered in the course of database administration can be described
roughly as follows:
There are two very large sets of result rows, which
we’ll call “Set-A” and “Set-B”, which are relevant to the query we want to
build. Both these result sets (rowsets) are so large that even the most
efficient possible query of either set, alone, would be too slow. As is the
case for most reasonable queries, however, the result set that we really want is a small result set
(sometimes even an empty result set
– we may just be trying to confirm
that the result is empty, to confirm that there is no problem, where a
non-empty end-result set would point to a problem!). This small end-result set
desired is either the intersection or the difference between the two very large
rowsets Set-A and Set-B.
If you visualize the query
diagram for the hypothetical query of Set-A, you should imagine a
normal-looking join tree having no highly-selective filter ratios attached,
since a highly-selective filter ratio (a filter ratio close to zero, that is)
would prevent Set-A from being so large. The same goes for the diagram for the
hypothetical query of Set-B. For example, the query diagram for Set-A might
look like:
Meanwhile, the query diagram
for Set-B might look like:
Finally, then, the query
diagram for the end-result query might look like
Naively, anyone who
understands query diagrams, looking at this last query diagram would expect the
query to return roughly 10% of the number of rows in A, since the filter on A
discards 60% of A’s rows, then the join to B would reach a filter that would
discard another half of the rows that are left, then the join to C would reach
another filter that discards another half of the remaining rows. Any cost-based
optimizer would come to the same conclusion, and would be likely to just call
for hash joins or sort-merge joins of full table scans of each of these tables,
discarding rows failing the filter conditions during the full table scans.
Assuming that A is a very large table, this would not be a small result set. When I put together a query diagram that
looks like the one just above, with large tables and poor filters (or even when
I just notice that a query appears to have no very selective filter), I suspect
that I am likely dealing with an overlap query, a query looking for the small
intersection of two large sets, because most reasonable business-application
queries just don’t need to return so many rows. (In Chapter 10 of SQL
Tuning, I discuss exceptions to this rule, where you find a query that does return a very large rowset, and
what to do about them.) In the example, however, the truth value of the filter
condition on B is not statistically
independent of the truth value of the filter condition on C (the truth values
are anti-correlated), and, in fact,
the two filter conditions are almost never both
true for a B-entity that joins to a C-entity. Therefore, if we join B and C, we
end up with not 25% of the rows in B, but, instead, with almost no rows at all
(or even no rows), and then a
nested-loops join to A following an index on A’s foreign key pointing to B is
fast and easy. If it happens that B and C are both (unlike A) small enough to
allow rapid full table scans, then, a hash join of B and C, followed by nested
loops to A, will solve the problem very nicely. Even if B (and maybe C, too) is
too big for this to be a really fast option, it still may be your best option, short of denormalization.
Describing this problem in
plain English, the conditions on B and C are roughly mutually exclusive, and we
want a query to find the rare exceptions to the mutual-exclusivity rule. This
is just the sort of thing that we often need to do in business applications,
when solving glitches in the business processes, for example, looking for
non-shippable orders (license renewals, for example) where the company billed
(incorrectly) for shipping. (The good news is that if such exceptions are
analyzed correctly, and the root-cause problems are solved, you should end up
with an empty result, and you may not need to run the query any longer, or
anyway only often enough to confirm that the problem has not returned.) This is
also just the sort of thing DBAs often need to do, when looking for corrupt
data, orphaned rows, or rows where denormalized data is out of sync with
correct, normalized values. As for the business-process glitches, the need for
these DBA exception-condition reports goes away or becomes less urgent once the
root causes for the glitches are fixed.
Occasionally, the
mutually-exclusive conditions are on the same
table. In this happy case, the problem is fairly easy to solve – just create
(if it is not there, already) a multi-column index, or a functional index, that
enables an index range scan that goes directly to the rare exceptions. The only
thing to watch for, here, is that it is probable that the optimizer won’t like the desired range scan, because the
optimizer will assume, incorrectly, that the conditions are statistically
independent. Therefore, it is likely that you’ll need an index hint to force
use of the index.
Where you find
mutually-exclusive conditions on different
tables, put together the minimal query that joins the tables containing the
mutually exclusive conditions, and work on optimizing just this query. (This
simplified query may eliminate many joins in the original query that were there
not to confirm the violation of mutual exclusivity, but, instead, just to
provide supporting detail for understanding the problem cases.) Try to tune
this simplified query following the normal query-tuning process, keeping in
mind that in the absence of any good filter ratios, you are likely to be
preferring hash joins (or sort-merge joins, if your RDBMS doesn’t do hash
joins) of full table scans. If the simplified query is then fast enough
(hopefully the simplified query doesn’t involve any really large tables), then
add back the rest of the original query, making sure that the execution plan of
the complete query starts with the
same steps as the optimized, simplified query, then reaches the rest of the
tables through nested loops to the join keys (which you may have to force with
hints, since the optimizer likely won’t realize how few rows are likely to be
left at that point in the execution plan).
Where the best you can do
after routinely tuning the minimal query that joins the tables containing the mutually
exclusive conditions still is too slow, apply a standard set of tricks that
apply pretty well to all queries that appear to be stuck with long runtimes:
If the combination of all
those tricks still does not resolve the problem, another trick is often
possible in these cases, even though this trick does not apply to other
long-runtime queries, in general:
Formulate a new,
more-selective query that just looks for recent
violations of the mutual-exclusivity rule. If (as is often the case) the
purpose of this query is to find examples
of business-process glitches (or data glitches, for the DBA problems), so that
the root-cause problems behind the examples can be discovered, then you’re
probably better off focusing on recent examples, anyway, since old examples may
have come from problems that have long-since been solved. By focusing on recent
examples, you can add a filter condition on some table that returns only the
most recent rows of one of the tables involved in the mutually-exclusive
conditions, a filter condition that is likely selective enough to solve the
long-runtime problem, especially if it uses an index.
If the query is still too slow, even after all the
tricks so far described (and this is quite rare), then you likely will need to
denormalize the data. Pages 265 and 266 of SQL Tuning describe how to promote
filters upward in a join diagram until they belong to a single table, using
denormalization. This can (and should) be done, generally, with database
triggers that ensure that the denormalization is rigorously consistent, even if
the application developers aren’t aware of the need to maintain it. Once the
mutually-exclusive conditions are promoted into a single table, then a
well-chosen multi-column index, or a functional index, on that table will reach
the rare exception rows with a rapid index range scan.
To see a cumulative listing
of all research problems (those below, and the earlier one from the first
newsletter), see all
research problems. There has been no solution submitted yet for the
earlier-published research problem.
If you look at stored
outlines in Oracle, it appears that Oracle has some undocumented hints that it
uses, at least internally. What undocumented hints are available on each RDBMS
(Oracle, SQL Server, et cetera), what do they do, and can they be used in
user-created SQL, or just internally? (This is an open-ended problem, clearly,
so I am not so much looking for a complete answer all at once as any
information out there that represents progress toward an answer.)
On most systems, vast system
resources are simply wasted, in terms of disk space, memory, and CPU. This
likely represents an opportunity, since these resources could be used in
conservative ways that would never significantly affect ordinary load, while
performing valuable background services. For example, during times of low CPU
and I/O consumption, index blocks, in B*-tree indexes, could be continually,
incrementally rebuilt, getting rid of enormous amounts of unused space
(especially in indexes with common deletes and updates), resulting in
more-compact, better-cached indexes without DBA intervention. As another
example (which Oracle 10g already does, to a limited degree), post-execution
optimization could be performed on high-resource-consumption SQL,
automatically, in the background, to find better execution plans for that SQL
the next time it executes.
©2006 Dan Tow, All rights reserved