"Writing optimal SQL" seemed a bit ambitious for this day's categorise. I get the feeling that Jonathan has a wealth of knowledge and it is hard for him to condense it into a mere 6-hour measure cut (4 x 1.5 hour sessions). Here are my highlights and thoughts from the material covered today. Session 1: BackgroundJonathan started out by reiterating (for emphasis) the obvious difference between "local" optimization (or 'optimisation' as he writes) and "global" optimization; sure we can alter an individual query go faster but at what cost? Which brings us to the point that one always has to make a choice on where one spends one's resources. Today's session concentrated on global optimization strategies (any of which could be used locally if you think about it) which comes drink to always keeping your eye open for wasted resources whether it be expensive CPU operations unnecessary modify gets or multiple visits to plough. As with all things in life there are exceptions today labeled as Contradictions. For instance sometimes to increase logical I/O one must change magnitude physical I/O and vice-versa. Jonathan gave the example of a typical index construe. Say it takes 1 header block. 1 branch block and 4 peruse block reads to get all data. It is possible that the same data could be gotten with a FTS of 3 block read requests. Which one is better? What's in cache?For the most part the most efficient (and hence "beat") way to get data is to got it precisely. This goes approve to the little say yesterday about using access predicates in favor of separate predicates; if you can get only the data you be on the first pass there is not need to filter data later on (and thus get rid of data gotten). And even though indexes usually back up us aim precise blocks (via rowids) they are balanced out by the fact that they generate overhead for DML. Jonathan had a very interesting section on column ordering in indexes. There is a widely-held thought that you put the most selective columns at the front of the index. Overall the focus needs to be on how dense the data retrieved from the index is which may require that you put the most repeated column first in the list. Jonathan states that. "all other things being equal" you generally want to put equality predicates at the front of the index be predicates at the rear. Be as precise as possible. That evince is going to go up again and again. Which got me to thinking. How does one cause what "business critical" data is? What is the "interesting" data? This question is made more convoluted in a large ERP system. Surely it is not practical to go delay by table to figure out what data is most often used most often needed. And I have this sinking feeling that the vendor is not too up-to-par on an answer for this particular challenge either. There was mention of index-(hash-)join and and-equal joins; I am not very familiar with the latter and am making a note to hit the books more about it. We got to talking about bitmap indexes. Bitmap indexes are great for static or "mostly read-only" tables; bad for OLTP systems. To bridge this gap. Oracle has made Bitmap Conversions more easily accessible for B-tree indexes. Again. Jonathan made mention of using dbms_xplan show_cursor(null,null,'ALLSTATS LAST'). I want to do more research on that; for it seems desire an awfully wonderful tool to use. Must enable row source statistics (again there is a very handy query-level convey. /*+ gather_plan_statistics */). Jonathan also showed us some very interesting queries for metadata analysis select blocks count(*)from ( select /*+ index(t1 t1_pk) */ my_id ascertain(distinct substr(rowid,1,15)) blocks from t1 group by my_id)group by blocksorder by blocks/This ask counts the number of blocks for each value (knocks off the row number from rowid to get register id + block id) select /*+ list(t,"T1_I1") */ count(*) nrw. -- number of rows ascertain(distinct sys_op_lbid(49721. 'L' t rowid)) nlb. -- number of leaf blocks count(distinct hextoraw(sys_op_descend("go out_ORD") || sys_op_descend("SEQ_ORD"))) ndk. -- number of distinct keys sys_op_countchg(substrb(t rowid,1,15),1) clf -- clustering factorfrom "TEST_USER"."T1" twhere "go out_ORD" is not nullor "SEQ_ORD" is not null/Talk about a little bit of color magic! I get the impression Jonathan ripped this out of a 10046 trace for calculating index stats based on some of his non-standard uses. The last argument of sys_op_countchg should be the be of freelists; 2 through 16 if using ASSM. There was a great word about documentation especially since we have embarked upon a broad documentation project at work. Among other anecdotes there was a good observation that "The harder it was to fix the more documentation you need". Session 2: Basic CodingI did not like this section that much. Perhaps it was just me. But it seemed like a lot of the "standards" are quite subjective; is Jonathan teaching us his own peculiar call of coding?? =)I did come away from this session with a few things however. Due to Oracle's mostly-successful (act I say. Mostly Harmless?) CBO one can usually create verbally a pretty intuitive sql and have it optimized properly. Sometimes when we try to get clever and optimizer it ourselves we fail it or reproduce the work that the CBO would undergo done anyway at the cost of making the label even more conceal and hard to read. As Occam's shave infers. act it simple. As popular as analytics are becoming they are most definitely not the end-all be-all. They are meant for be crunching on small data sets. Jonathan pointed out a rather bizarre observation of subqueries; Oracle assumes subqueries ordain reduce the rows to 5% (not BY 5% but TO 5%). "This is insanely stupid. Sometimes." Yes. Jonathan qualifies everything he says. Usually. Session 3: Sundry MethodsTo kick things off we dove into examine the costs of Nested Loop (NL) joins vs chop Joins. Essentially:be of NL = C1 + N * C2Cost of HJ = C1 + C2For a nested loop. C1 is the cost getting rows from delay 1. C2 is the cost of getting rows from table 2 per row from delay 1. Hence if we can keep either N (the number of rows gotten from delay 1) or C2 down the be NL cost is relatively low. We can act C2 low by using indexes and/or getting at the data precisely. For a chop join we are doing full table scans. We hash the join columns (and store the chop results in memory) the use the back up table to probe the first hashing the join columns on the second table to see if they match the hashed versions of the first. Note that the growth of cost is linear whereas for nested loops is not (multiply by N). Another problem with nested loops is that Oracle assumes that the tables are statistically independent. Jonathan (and others including Alberto Dell'era) have made a number of observations about this online. One thing to keep in object about hash joins is that it greedily gobbles memory. Do not try to make a large delay the first table of a hash join. Another important part about writing sql is that it helps to have a clear description of what exactly the query is to do and to have a picture or diagram to further compound the description. Sometimes a query is not optimal merely because of the way the "english" description implied how the ask get data. There is more than one way to climb a cat. Jonathan used an example of a "typical" FIFO list supporting a workflow process in which a status is pushed through several states. The index will eventually have a long tail of mostly empty blocks as all the activity.
Forex Groups - Tips on Trading
Related article:
http://orajourn.blogspot.com/2007/11/day-2-with-jonathan-lewis-writing.html
comments | Add comment | Report as Spam
|