Is it possible to store two different
outlines for the same SQL under the DEFAULT category, where that SQL is
interpreted differently, depending on which SQL it is interpreted from, and to
have each schema use the appropriate outline? For example, in one schema,
"Orders," which is referenced in a query, might be a simple, local
table, while in another schema, "Orders" is a UNION-
If you look at stored outlines in Oracle, it appears that Oracle has some undocumented hints that it uses, at least internally. What undocumented hints are available on each RDBMS (Oracle, SQL Server, et cetera), what do they do, and can they be used in user-created SQL, or just internally? (This is an open-ended problem, clearly, so I am not so much looking for a complete answer all at once as any information out there that represents progress toward an answer.)
On most systems, vast system
resources are simply wasted, in terms of disk space, memory, and CPU. This
likely represents an opportunity, since these resources could be used in
conservative ways that would never significantly affect ordinary load, while
performing valuable background services. For example, during times of low CPU
and I/O consumption, index blocks, in B*-tree indexes, could be continually,
incrementally rebuilt, getting rid of enormous amounts of unused space
(especially in indexes with common deletes and updates), resulting in
more-compact, better-cached indexes without DBA intervention. As another
example (which Oracle 10g already does, to a limited degree), post-execution
optimization could be performed on high-resource-consumption SQL,
automatically, in the background, to find better execution plans for that SQL
the next time it executes.
If a human being were to
execute SQL manually, against a series of tables and indexes that the human
could see, he or she would likely choose a good-looking path to the data and
start to follow that path. If, however, the human found that the results of
following that path violated the assumptions he or she made when choosing that
path, and appeared likely to take much longer than originally expected, he or
she would probably revisit the question of which execution plan was best and
potentially would start over with a better plan before wasting too much time on
the bad plan. Does any database do this? Propose a good algorithm for a
database that would.
Here’s an extension of last quarter’s research problem, which was to consider cases when it might be worthwhile to halt and retune a query in the middle of its execution. Even if it never changed plans “midstream,” the optimizer could still gather information relevant to whether it should choose a new plan the next time that SQL executes. What sort of data should the database gather during execution pertaining to finding a better plan for a later execution. When should it bother? What sort of data should the database gather offline? When should it bother? How and when should it use that data to find a better plan for a later execution? (I have a number of ideas regarding possible answers, but I’d first like to open up the discussion.) Does any database besides Oracle10g (which has some capability along these lines) do something like this? Propose a good algorithm for a database that would. How well can we apply the answers to these questions to finding more effective methods to choose which SQL to tune manually and to improving the actual process of manual tuning?
Does Oracle in its very latest incarnations “look across the link” to see statistics from the other end of the link when optimizing the execution plan of a query that combines data on more than one database? Does any non-Oracle RDBMS do this?
Will any version of Oracle (or any other RDBMS) parse and optimize a raw (PL/SQL-function-free) query remotely, then pass the raw rows to the local database to perform any function calls and any GROUP BY operation that might depend on those function calls?
What data would optimizers need to calculate the tendency for two specific tables queried on a specific filter to co-cluster, so that the optimizer could correctly estimate self-caching in the joined-to table? What would be the formula to apply that data to a correct estimate of the performance benefits of co-clustering??
©2006 Dan Tow, All rights reserved