Research Problems

2006-Q1: Stored Outlines Sensitive to Schema?

Is it possible to store two different outlines for the same SQL under the DEFAULT category, where that SQL is interpreted differently, depending on which SQL it is interpreted from, and to have each schema use the appropriate outline? For example, in one schema, "Orders," which is referenced in a query, might be a simple, local table, while in another schema, "Orders" is a UNION-ALL view of the local table and a remote-linked archive table, and the execution plans of the query must necessarily depend on the schema from which the query runs. If it is impossible to store two outlines under the DEFAULT category, what happens when the user attempts to execute the SQL, with user_stored_outlines=TRUE, from the wrong schema, the schema to which the outline does not apply? Does Oracle simply ignore the outline, or does it generate a plan that is wrong for that schema, even a plan that results in a wrong-rows bug? Categories seem to be a mechanism to let the process point to the appropriate set of outlines, depending on which schema the application attaches to, but this is inconvenient if the application has not been specifically written to know that it needs to ALTER SESSION to set user_stored_outlines=<The Correct Category for That Schema> when it connects to a particular schema. Is there some way to automate an ALTER SESSION command so that it happens automatically (some sort of trigger on connect?) when the schema is set or changed?

2006-Q2: Undocumented Hints

If you look at stored outlines in Oracle, it appears that Oracle has some undocumented hints that it uses, at least internally. What undocumented hints are available on each RDBMS (Oracle, SQL Server, et cetera), what do they do, and can they be used in user-created SQL, or just internally? (This is an open-ended problem, clearly, so I am not so much looking for a complete answer all at once as any information out there that represents progress toward an answer.)

2006-Q2: Ways to Use Wasted Resources

On most systems, vast system resources are simply wasted, in terms of disk space, memory, and CPU. This likely represents an opportunity, since these resources could be used in conservative ways that would never significantly affect ordinary load, while performing valuable background services. For example, during times of low CPU and I/O consumption, index blocks, in B*-tree indexes, could be continually, incrementally rebuilt, getting rid of enormous amounts of unused space (especially in indexes with common deletes and updates), resulting in more-compact, better-cached indexes without DBA intervention. As another example (which Oracle 10g already does, to a limited degree), post-execution optimization could be performed on high-resource-consumption SQL, automatically, in the background, to find better execution plans for that SQL the next time it executes.

2006-Q3: Bailing out of Bad Plans

If a human being were to execute SQL manually, against a series of tables and indexes that the human could see, he or she would likely choose a good-looking path to the data and start to follow that path. If, however, the human found that the results of following that path violated the assumptions he or she made when choosing that path, and appeared likely to take much longer than originally expected, he or she would probably revisit the question of which execution plan was best and potentially would start over with a better plan before wasting too much time on the bad plan. Does any database do this? Propose a good algorithm for a database that would.

2006-Q4: Opportunistic, Iterative Tuning

Here’s an extension of last quarter’s research problem, which was to consider cases when it might be worthwhile to halt and retune a query in the middle of its execution. Even if it never changed plans “midstream,” the optimizer could still gather information relevant to whether it should choose a new plan the next time that SQL executes. What sort of data should the database gather during execution pertaining to finding a better plan for a later execution. When should it bother? What sort of data should the database gather offline? When should it bother? How and when should it use that data to find a better plan for a later execution? (I have a number of ideas regarding possible answers, but I’d first like to open up the discussion.) Does any database besides Oracle10g (which has some capability along these lines) do something like this? Propose a good algorithm for a database that would. How well can we apply the answers to these questions to finding more effective methods to choose which SQL to tune manually and to improving the actual process of manual tuning?

2007-Q1: Handling Optimization with Links

Does Oracle in its very latest incarnations “look across the link” to see statistics from the other end of the link when optimizing the execution plan of a query that combines data on more than one database? Does any non-Oracle RDBMS do this?


Will any version of Oracle (or any other RDBMS) parse and optimize a raw (PL/SQL-function-free) query remotely, then pass the raw rows to the local database to perform any function calls and any GROUP BY operation that might depend on those function calls?

New problem: Estimating Self-Caching for a Join to a Co-Clustered Table

What data would optimizers need to calculate the tendency for two specific tables queried on a specific filter to co-cluster, so that the optimizer could correctly estimate self-caching in the joined-to table? What would be the formula to apply that data to a correct estimate of the performance benefits of co-clustering??

 

©2006 Dan Tow, All rights reserved

 

Home