Hotsos Symposium 2010 — Battle Against Any Guess Is Won

Video fragments of my session posted at the end — read on.

I arrived at Omni Mandalay Hotel on Sunday evening with Dan Norris. I was flying through Chicago and it turned out that Dan was on the same flight and only few rows behind me. Small world.

Preparations for the conference were very chaotic on my part and, of course, I didn’t have either of my presentations ready. I was very stressed and getting sick as well — it looked like a complete disaster waiting to happen. I’d like to say that I was feeling like Doug Burns as he often managed to get sick just before a conference. Of course, I worked on my slides for the last few days as well as on the flight and presentation was slowly getting there but boy was I tired!

I quickly said hello to the crowd in the bar on the way to my room and rushed away to do some more damage to my slides. And then I had a brilliant idea — I could still see one of my best mates and do something good about my presentation! I asked Doug if he was interested in the preview (he probably wasn’t interested but he couldn’t say it to me) especially that my session wasn’t on his original agenda. Of course, that would mean that he had to leave a bunch of other good friends and spend some time tete-a-tete. Knowing Doug, this is some of the hardest thing to ask from him but it shows how good of a friend he is! (Plus, everyone thinks that he is anti-social anyway. Shhhh!)

Doug has made my day — while he provided lots of ideas and feedback on few things that I was lucking, he generally approved the idea and confirmed that it wasn’t totally crazy. I guess that was all I needed back then and Doug knew how nervous I was about it. (Thanks mate!)

So I called Sunday a day very early and went to bed before midnight. I really needed some sleep. Woken up by the alarm at 5AM (I woke up few times during the night looking at the clock — making sure I didn’t sleep through) and slides were ready just before lunch. I even managed to do a test run and it took 65 minutes — a wee bit too long for one hour session. But it was good test and I knew I had to be just a bit more concise in few parts.

Mi morning was very productive. Unfortunately, I missed the opening keynote from Tom Kyte. Such a pity! If what Doug wrote is true, Tom was talking about the mistakes we make *because* of our experience and our assumptions. This was exactly one of the points I was making in my Battle Against Any Guess — experience is danger. I wish I could see Tom’s example. Oh well, maybe another time.

I managed to attend half of the Richard Foote’s session on indexes but my mind was far away — with my own slides. Though, I did manage to focus on bitmap indexes part and the myth of bitmap indexes not working well for columns with high cardinality. Very interesting conclusions. I’m still wondering how much overhead updates will do to such bitmap index.

After lunch, it was my turn. I ordered few copies of the latest OakTable book — Expert Oracle Practices: Oracle Database Administration from the Oak Table — that I co-authored with the bunch of other Oakies. I contributed chapter 1 in the book titled just like my presentation — Battle Against Any Guess. The plan was to give a copy away during the presentation and do a draw for another one at the end of the session. I was so nervous that I forgot about it until the end of the session so I just did a draw for two copies. The lucky winners were Lynn-Georgia Tesch and Surendra Anchula. Congratulations! For the rest of you who left the contact details — please stay tuned and we’ll organize few things online.

Now the main topic of this post — my presentation. What’s unusual about this session is that it’s not some technical stuff that I usually do but a more conceptual and motivational talk. Could I pull it off? Well, I think it went fairly well in general even though I did identify few rough places and my lack of English language mastering. Might need to work a little bit more on the flow of the presentation.

We had quite a few good laughs. Later, people in the next hall were asking about it and Dan was making the jokes on the stage so it must have been loud. Anyway, I think nobody fell asleep and I managed to get people thinking about the topic. I received many “thank you” notes yesterday and compliments on a good session so by the end of the day I was more and more pleased. Thanks everyone for attending and especially big thanks to those of you who brought to my attention examples from their own battles. If you have more to discuss — contact me by email (my last name) {at} pythian.com.

Thanks to Marco Gralike for recording some fragments and sharing them. I think he has more to come.

This is the introductory couple minutes. You can definitely notice how nervous I am starting on the stage:

Solving the wrong problem example:

That’s all for now. Stay tuned — more to come.

Fun with Filters

mwidlake | Mar 9, 2010 13:15 +0000

This post is about the importance of filter statements in execution plans and how they can indicate problems that are not always apparent.

I had a problem recently with some code that had gone rogue – it had been running in a few seconds, being executed every 10 minutes or so. Now it ran until either someone killed it off or it got “snapshot too old” errors. I pretty much knew it was prompted by stats being gathered on the tables in question as we had just gathered stats on the tables in that schema.

The code was something like this (it is not too important what exactly the code was). Oh, and this is on 10.2.0.3, enterprise edition with partitioning.

select  accounted, max(recorded_date),count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
   and 1= 1 – this is pointless, it replaces a check of “run”=”run”
   group by accountid,DATA_SRC_COUNTRY_ID

Those tables are quite large, several million rows in each. The code is basically scanning all of the rows in the table as there are no indexes to support the query and it is written in a way that, well, I would not write it.

Digging around in some monitoring tools I use I confirmed that the code has swapped plan

inst SQL_ID        Plan Hash   First Load   execs
---- ------------- ----------- ------------ ------
  TOT_ROWS       TOT_BG      TOT_DR        TOT_CPU    CREATED_DT
---------- ------------ ----------- --------------  ------------
   3 1t7ma4xn3rw46  3183284037 100203 09:10     125
     7,854     6,457,885   1208,149    816,476,585  091020 10:27
   3 1t7ma4xn3rw46  3127554972 100206 09:13       2
         0  1260,936,642        980  94599,678,960  100205 09:19

Version 1 comes back in 5 or 6 seconds. Version 2 does not effectively come back, which is why it records 0 rows. You can see that the Disk Gets are pretty low in version 2 (allowing that it was never left to finish) but Buffer Gets and CPU are both massively up. So much so, it exceeds where I format the numbers with comas (as, if they are THAT big, they need fixing anyway).

I looked at the plans and, though they were different, nothing jumped out at me. So I trimmed down the code and built it up section by section until I saw something significant change. This is my general mode of tuning code if nothing initially occurs to me.
As usual, I started with any scalar or inline code, in this case that SELECT…MINUS…SELECT in the where clause.

SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
Minus
SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_

This came back in 3.145 seconds, with all of 62 rows. That is fine.

Now I added back the use of the MINUS code as a WHERE clause to the outer query.

select accountid
from  W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      ) 

This came back in 5.22 seconds with 4468 records. Again, I can live with that, we do not need sub-second response, we need sub-minute response.

So I now added back in the group by accountid…

select accountid,count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
group by accountid

This does not come back before I have fetched and consumed a cup of tea. That is a significant change.

I was not expecting this, why would sorting under 5 thousand rows stress a database that can scan multi-million row tables in seconds? There is all that CPU being burned up and huge numbers of buffer gets, but not much in the way of disc IO, so it is probably doing something that can be done in memory a very, very large number of times.

What is the difference in plan? {Oh HELL, they look rubbish – sorry, click on the images to blow them up, I’d suggest right-click and open in a new tab, and, YES, I am working on making my blog a little more “wide and friendly”.}. Good plan first, bad plan second

The view step has disappeared, which was an expensive step so should help not hinder.
The “SORT GROUP BY” has appeared, which it would do as we introduced the “GROUP BY” clause.
The hash join has been replaced with a filter. In a more complex plan you might miss this replacement, you would maybe notice the hash join disappearing but a filter, well, it is checking some simple “WHERE” clause isn’t it?

Well, the hash join was joining the results coming from the two steps feeding into it, the PARTITION RANGE FULL and the VIEW (which is the in-memory construct of the SELECT..MINUS…SELECT statement).

Now the Filter is filtering the results from the PARTITION RANGE ALL with the results from the MINUS. At this point I’d like to highlight that the predicted cardinality and bytes coming back for the steps within the union have reduced by a factor of 100 from the good and bad plans. And I’ll also admit I hid some key detail in the screen shot. I am not showing the access and filter predicates.


The above is a screen shot showing that in the new code there are no filter predicates but an access predicate, for the access to the view (ie the select…minus…select). For easier reading, the full access predicate is below, rather than in the screen shot:

“$nso_col_1″=TO_CHAR(“ACCOUNTID”)||TO_CHAR(“DATA_SRC_COUNTRY_ID”)

However, for the slow code, there are no access predicated but are filter predicates. Again, the screen shot shows that there are predicates and I show the full text below. (Screen shots are from PL/SQL developer, btw).

2- EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")
   ||TO_CHAR("DATA_SRC_COUNTRY_ID")
FROM "W_LCG_OPENING_" "W_LCG_OPENING_"
WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2))
MINUS (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")
FROM "W_LCG_CLIENT_" "W_LCG_CLIENT_"
WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B3)||TO_CHAR(:B4)))	

8- TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2)	

10- TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2)	

Basically, the filter is to run the union query for every row from the driving query, passing in the relevant filter clause as an extra predicate to each of the individual queries of the SELECT…MINUS…SELECT

Clever, but not of much help to us as the performance is awful.

I showed this to a colleague and their immediate response was “why does that not show up as a nested loop? How am I supposed to spot that the “filter” is doing so much?” There were a couple of robust Saxon words mixed in with that statement.

So, Beware Filters replacing joins and get in the habit of checking out the filter and access predicates

If you use an old-style template for showing plans {like I do, a guilty sin of mine}, or a GUI where you have not selected that the filter and access predicates be shown, you may well not get them displayed. If you use autotrace in SQL*Plus, you will though:

db3_mw> select accountid,count(*)
  2  from W_LCG_OPENING_
  3  where accountid||DATA_SRC_COUNTRY_ID in
  4       (
  5         SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
  6         minus
  7         SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
  8        )
  9  group by accountid
 10  /

Execution Plan
----------------------------------------------------------
Plan hash value: 397856216

--------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cos
t (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |   911 |  7288 |
15M  (7)| 17:25:41 |       |       |
|   1 |  SORT GROUP BY           |                  |   911 |  7288 |
15M  (7)| 17:25:41 |       |       |
|*  2 |   FILTER                 |                  |       |       |
        |          |       |       |
|   3 |    PARTITION RANGE ALL   |                  |  3153K|    24M|  83
11   (4)| 00:00:34 |     1 |   840 |
|   4 |     TABLE ACCESS FULL    | W_LCG_OPENING_   |  3153K|    24M|  83
11   (4)| 00:00:34 |     1 |   840 |
|   5 |    MINUS                 |                  |       |       |
        |          |       |       |
|   6 |     SORT UNIQUE NOSORT   |                  | 31535 |   246K|  86
15   (7)| 00:00:35 |       |       |
|   7 |      PARTITION RANGE ALL |                  | 31535 |   246K|  86
11   (7)| 00:00:35 |     1 |   840 |
|*  8 |       TABLE ACCESS FULL  | W_LCG_OPENING_   | 31535 |   246K|  86
11   (7)| 00:00:35 |     1 |   840 |
|   9 |     SORT UNIQUE NOSORT   |                  |   132 |  1056 |
17  (18)| 00:00:01 |       |       |
|* 10 |      INDEX FAST FULL SCAN| W_LCG_CLIENT__PK |   132 |  1056 |
16  (13)| 00:00:01 |       |       |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_C
OUNTRY_ID") FROM
              "W_LCG_OPENING_" "W_LCG_OPENING_" WHERE
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)|
|TO_CHAR(:B2))MINUS (SELECT /*+ */
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")
FROM "W_LCG_CLIENT_"
              "W_LCG_CLIENT_DETAILS" WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_S
RC_COUNTRY_ID")=TO_CHAR(:B3)||TO_CHAR(:B4)
              )))
   8 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)|
|TO_CHAR(:B2))
  10 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)|
|TO_CHAR(:B2)) 

What did I do to fix the problem? Well, even though the code originally went bad due to stats being gathered, I could not force it back to a nice plan after an hour or two playing with gathering new stats so, in this case, I used an UNNEST hint.

select accountid,count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT /*+ unnest */ accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
group by accounted

And it forced the plan back to the version using a HASH join.

The hinted plan

I’m a little unhappy about that hint, but the code needed fixing and the hint did what I wanted it to {I think it was poor of me to only hint one of the two minus statements and I do not like resorting to hints just because I can’t work out what is “wrong” with the stats – If I work out what is wrong and still need the hint, fair enough}. However, I had to be pragmatic and get the code fixed and working in Live, so it has gone in with a hint


Hotsos 2010 – Congratulations, Marco!

You managed to capture a couple of minutes of my presentation when there was a picture on the screen!


Hotsos 2010 – My Presentation

I really don't know how to blog about this because every time I feel I'm honestly self-critical, everyone thinks I'm close to suicide or something. I like to think I notice both the good and the bad but am probably more likely to speak openly about the bad. If you ask me it's a Scottish thing about not getting above yourself or blowing your own trumpet too loudly. Or maybe it's just that the only way you can improve is by noticing the bad stuff and fixing it? Someone once said to me "You can't be an insecure overachiever without first being insecure" ;-) and they aren't Scottish, so I shouldn't generalise. Updated later - it was Cary Millsap who came up with this line. I loved it when he mentioned it a few months ago and still do.

In this case I'll try to be even-handed and finish with the positives!

Negatives

The whole point of this presentation is that it's about 50 minutes of demonstrations and about 10 of slides. For the first 30 minutes, the demonstrations would not work. If you think that's a success, there's something the matter with you and you should probably never give any presentations!

This is the killer. In retrospect I know what the problem was and have fixed it previously with the help of others but did not do the same on the new laptop! As I moved in and out of wireless range, the lack of a network connection at the Windows end completely threw VMWare and my demos. But, believe me, when your demos have been working for a couple of weeks, you don't have long until your presentation and they stop working intermittently, it's difficult to be cool and analytical. I was so distracted by other things that might go wrong, I missed one and, having missed it, I wasn't cool enough to recognise the symptoms. Fortunately, an old Hotsos friend in the room came up with the goods in the form of a little Sprint wireless box that got me hooked up. I kissed him, but there were no tongues involved.

With only 30 minutes left, there was very little I could show and it completely ruined the whole flow of the presentation, which I'd worked so hard to get right and which I know can be terrific, because I've done similar presentations before and had been looking forward to doing the best version yet. People took the time to come and see it, I let them down and I'm sorry about that. Nobody is likely to change my view on that.

This presentation is a big deal to me, never mind anyone else. I work hard on these things, try to cover all the angles, take time off work and all because I like teaching people new stuff. When I take a week off to attend a conference, but also to present, it's a bit of a blow when your one shot fails. If you're not passionate about your presentations (and I somehow doubt anyone isn't) again, you shouldn't be presenting.

Positives

Sh*t happens when you get involved with computers. Yes, folks, I know that, but that's also why you play around with demos for a long time to minimise the possibility that it will!

This might not sound like a positive and it's dangerously close to sounding like an excuse, but the fact is that I've had 2 out of 3 presentations go very wrong recently. One of the consistent factors in this is that I switched to VMWare because I had to to run 11gR2 on Windows. It's not the same as blaming VMWare to say that I've been having to deal with stuff I haven't for a long time. When everyone was talking about VMWare as the way to go for presentations, I remember thinking 'I don't know, just seems like more moving parts that might break to me.' and I kept reading blog posts about demos being broken and then fixed just in time, all of which were on some virtualisation platform or other, but didn't have the confidence to say something. All I know is, say what you like about Windows, but I've hardly ever had a problem in multiple 2-day course teaches of performance and OEM stuff! Still, it's down to me to get on top of what are some simple issues.

Listen, I know I can present. No false modesty round here. So, to wrap this up on a positive note, I'm well aware that there are few of the people who I see present who could have managed to get through that first half hour, make people laugh, keep thinking about the problem and manage to get a short demo of Swingbench into the bargain. I think I also managed to salvage something out of the last half of the presentation without completely collapsing into a heap. Sure, I was a bit brain-addled by then, but I would have liked to have seen how others might have coped ;-) Frankly, I kept waiting for the room to empty (I've seen it happen) but the vast majority stuck around to the end. Maybe they were sadists! LOL

I am not and will never manage to be happy with that presentation but lessons have been learned, it's just a presentation and there'll be lots of others. I know that.

Postive Solution 1 - If anyone wants to try to grab me while I'm at the conference, I'll show you the screens and demos. They're very cool ;-)

Positive Solution 2 - I think Alex Gorbachev might try to arrange for me to repeat the presentation properly as a webinar. I've already done this at my current customer site once (on the 10g stuff) and it went reasonably well. I'd also ... get this ... asked Marco to video it for me yesterday with his snazzy mini-setup because I thought I might post a few bits online if they were particularly good so those that can't make it to conferences could get a taste of it. Actually, it appeals to my cold, self-deprecating sense of humour to post some of it online soon. I promise I won't make it too self-flaggelating though!

P.S. For Paul Vallee. Paul it is not all good :-)

Oracle really wants to hear from you! – Patching Survey

Hari | Mar 9, 2010 05:57 +0000

Welcome, Oracle really wants to hear from you!

If you are involved with the patching process (evaluating, approving, testing, or deploying), Oracle wants to hear from you. Our primary focus is on how to improve this process for Oracle customers.

We sent this to you because you either; responded to a previous survey or forum post where we asked if you would participate in future studies, or are currently involved with downloading patches from My Oracle Support.

This survey IS NOT for providing general feedback about Oracle products or My Oracle Support. Please use your support process for that type of feedback.

This survey will take about 10 to 15 minutes.

You will be prompted concerning Oracle Database/Middleware, E-Business Suite (Oracle Applications) and Enterprise Manager.

The survey will automatically skip the areas which you respond as not applicable.

Concerns? Want to verify this is from Oracle? Email us directly.
My Oracle Support and Enterprise Manager Patching Team
ajay.prasad@oracle.com

Take the Survey

Clustering Factor: Row Migration’s Victim

This article describes the effects of a high row migration rate on the clustering factor and the optimizer’s ability to select the best execution plan.

In my previous article—Row Migration and Row Movement—I have demonstrated that the “insert empty, update everything” anti-pattern can lead to 100% row migration. This article continues the research on row migration and unveils surprising effects on the clustering factor. To be precise, the clustering factor can become completely bogus in presence of a very high row migration rate. Once the clustering factor is “wrong”, it’s just a finger exercise to construct an optimizer trap and proof that row migration can affect the query plan.

To start off, I create a table similar to the one in the previous article. However, I add one more column and populate it with random values from 0 to 9. I will need this column to build my optimizer trap later on.

CREATE TABLE row_mig1 (
  a CHAR(2000),
  b CHAR(2000),
  c CHAR(2000),
  x NUMBER      NOT NULL,
  filter NUMBER NOT NULL,
  CONSTRAINT row_mig1_pk PRIMARY KEY (x)
) ENABLE ROW MOVEMENT;

BEGIN
   FOR i IN 1..100000 LOOP
      INSERT INTO row_mig1
                    (x, filter)
             VALUES (i, trunc(dbms_random.value(0, 10)));

      UPDATE row_mig1
         SET a ='a', b = 'b', c = 'c'
       WHERE x=i;
   END LOOP;
END;
/
COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1', CASCADE=>TRUE);

Up till now everything is very similar to the last article, except that I used DBMS_STATS instead of ANALYZE TABLE. Although we know exactly that almost every row was migrated, DBMS_STATS doesn’t care about that:

SQL> SELECT num_rows, chain_cnt
       FROM user_tables
      WHERE table_name='ROW_MIG1';

  NUM_ROWS  CHAIN_CNT
---------- ----------
    100000          0

SQL> 

However, let’s have a look at the index:

SQL> SELECT num_rows, leaf_blocks, clustering_factor
       FROM user_indexes
      WHERE index_name = 'ROW_MIG1_PK';

  NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
    100000         187               918

SQL> 

Well, that’s an excellent clustering factor. A low clustering factor indicates that the index is in the same sequence as the table. That’s true in that case because the index column corresponds to the order in which the rows were inserted to the table. However, 918, how can this be? The clustering factor is supposed to be between the number of table blocks—which indicates a good clustering factor—and the number of index rows—which is the worst case. So, let’s look at the table size:

SQL> SELECT t.blocks             table_blocks
          , i.clustering_factor  index_clustering_factor
          , i.num_rows           index_rows
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE index_name = 'ROW_MIG1_PK';

TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS
------------ ----------------------- ----------
      100877                     918     100000

SQL> 

According to that, the lower bound for the clustering factor is higher than the upper bound. Hmm, let’s investigate the clustering factor manually and verify the distribution of the rows across the table blocks:

SQL> SELECT * FROM (
       SELECT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) block_number
            , COUNT(*) rows_in_block
         FROM row_mig1
        GROUP BY DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid)
     ) WHERE ROWNUM <=10;

BLOCK_NUMBER ROWS_IN_BLOCK
------------ -------------
         523           109
         524           113
         525           109
         526           109
         527           110
         536           109
         537           110
         538           109
         539           109
         540           110

10 rows selected.

SQL> 

There are 109 different ROWIDs that refer to the block number 523. But we know that each record has about 6k. It is impossible to fit 109 rows into a single block of 8k. However, the “insert empty, update everything” anti-pattern makes it possible. The game goes like this:

  1. The very first row is inserted into a new block.

  2. The first row is updated, and fits into the same block. The free space in that particular block is still more than a kilobyte.

  3. The second row is inserted into the very same block, because there is enough free space available in that block.

  4. The update of the second row triggers the migration of that row to a different block.

    The row migration changes neither the ROWID nor the index entry. That means that the forwarding address—that is, somehow, the new ROWID—is stored in the original block so that the index can find the row.

  5. The third row is inserted into the very first table block, again.

    Because the second row was moved into a different block, the very first block has still free space. There is only the first row—as a whole—and one forwarding address stored in that block. So, the insert of the third row can take place there.

  6. And so on. Until the forwarding addresses fill the block—up to PCTFREE.

That’s a good one, hmm?

We can even verify that:

SQL> SELECT * FROM (
       SELECT MIN(X)                               min_x
            , MAX(x)                               max_x
            , MAX(x) - MIN(x) + 1                  diff_x
            , DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) block_number
            , COUNT(*)            rows_in_block
         FROM row_mig1
        GROUP BY DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid)
        ORDER BY min_x
     ) WHERE ROWNUM <=10;

     MIN_X      MAX_X     DIFF_X BLOCK_NUMBER ROWS_IN_BLOCK
---------- ---------- ---------- ------------ -------------
         1        113        113          524           113
       114        222        109          523           109
       223        332        110          537           110
       333        441        109          538           109
       442        550        109          539           109
       551        660        110          540           110
       661        769        109          541           109
       770        878        109          542           109
       879        987        109          543           109
       988       1096        109          525           109

10 rows selected.

SQL>

You see that the first row was inserted into block number 524. All subsequent rows up to the 113th were put into the same block. When that block was finally filled up—with one row and 112 forwarding addresses—the game starts over in the next block. All the INSERT statements took place in just 918 distinct blocks. Because the ROWID is assigned during the INSERT, the subsequent migration of the row due to the UPDATE is not reflected in the ROWID.

Neither DBMS_STATS nor ANALYSE TABLE look into the table to check if the row is really there or if the particular block it is just an accumulation of forwarding addresses. From their perspective, all the rows are in the same block—this is how the clustering factor is calculated. Although correctly calculated—technically—and up to date, the clustering factor of this index does not reflect the real situation.

The “correct” value—in that sense that it reflects the data distribution correctly—for the clustering factor would be 100.000; that is, the number of rows. If the clustering factor equals the number of rows in the index—which is the worst possible case—it means that there are no two adjacent index entries that refer to the same table block. This is actually the case because no 8k block can contain two complete 6k rows.

The Clustering Factor is Wrong, So What?

So, what’s the problem if the clustering factor is wrong?

The problem is that the Cost Based Optimizer (CBO) uses the clustering factor in its cost calculation for an INDEX RANGE SCAN (see Fallacies of the Cost Based Optimizer [pdf]). That means, the cost of the INDEX RANGE SCAN will be too low, because the clustering factor is way too low. Luckily there is an effect that makes all that less problematic; all indexes on that table are affected.

Let’s make a second index to verify that:

SQL> CREATE INDEX row_mig1_idx ON row_mig1(filter);

Index created.

SQL> BEGIN
       DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1',
                                     CASCADE => TRUE);
     END;
     /

PL/SQL procedure successfully completed.

SQL> SELECT i.index_name         index_name
          , t.blocks             table_blocks
          , i.clustering_factor  index_clustering_factor
          , i.num_rows           index_rows
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE table_name = 'ROW_MIG1';

INDEX_NAME   TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS
------------ ------------ ----------------------- ----------
ROW_MIG1_PK        100877                     918     100000
ROW_MIG1_IDX       100877                    9180     100000

SQL> 

The clustering factor of the new index is bigger than the one of the original index, but still way off its real value. The “correct” clustering factor for the new index would be 100.000 because there are no two adjacent index entries that refer to the table block. Well, they actually do, but there is nothing more than the forwarding address in those blocks.

Ten Times as High

The clustering factor is ten times as high because the filter column has ten distinct value. That means that the adjacent index entries will refer to ten blocks at least, because those entries were not inserted in the same order as they are stored in the index. On the other hand, the index on the filter column is not only sorted by the filter value, but also by the ROWID—as every nonunique index in Oracle—so that the clustering factor is kept at a minimum. Finally, the clustering factor is ten times as high because it refers to ten times as many table blocks, but does not grow above that because the index order keeps the clustering factor at a minimum.

Although the clustering factor does not correctly reflect the efforts to fetch the table rows, it correctly reflects the relation between the two indexes. The primary key index has a lower clustering factor because it has the same sequence as the table itself. On the other side, the index on the filter column has a different order, thus the value is higher. The side note explains why it is Ten Times as High.

If all indexes are affected, there is hardly any real problem I can see for a single table access. Even queries that can be executed with two different indexes, the CBO will most likely not do wrong because both clustering factors are misleading.

The Join Trap

If it’s not possible to confuse the optimizer with a single table, let’s use more of them. So, I try to build a trap where the optimizer’s decision of the join order is influenced by the phony clustering factor so that the optimizer takes the less efficient execution plan.

For that purpose, I build a second table very similar to the first one. There are only two differences:

  1. I don’t follow the “insert empty, update everything” anti-pattern—there will be no row migration.
  2. The selectivity of the filter columns is slightly increased (about 6% instead of 10%).

Here is the overall script:

CREATE TABLE row_mig2 (
  a CHAR(2000),
  b CHAR(2000),
  c CHAR(2000),
  x NUMBER      NOT NULL,
  filter NUMBER NOT NULL,
  CONSTRAINT row_mig2_pk PRIMARY KEY (x)
) ENABLE ROW MOVEMENT;

INSERT INTO row_mig2 (x, filter, a, b, c)
     SELECT level, trunc(dbms_random.value(0,17)), 'a', 'b', 'c'
       FROM dual CONNECT BY level <= 100000;
COMMIT;

CREATE INDEX row_mig2_idx ON row_mig2(filter);

EXEC DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG2', CASCADE=>TRUE);

Now let’s check the statistics:

SQL> SELECT i.index_name         index_name
          , t.blocks             table_blocks
          , i.clustering_factor  index_clustering_factor
          , i.num_rows           index_rows
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE table_name = 'ROW_MIG2';

INDEX_NAME   TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS
------------ ------------ ----------------------- ----------
ROW_MIG2_PK        100749                  100000     100000
ROW_MIG2_IDX       100749                  100000     100000

SQL> 

Please note that the clustering factor is at the upper bound of the expected range; that means, it indicates that there are no two adjacent index entries referring to the same table block. That’s somehow logical, if we consider that every row is in its own table block.

After that preparation, I can present my “trapQL”:

SELECT *
  FROM row_mig1 d1
  JOIN row_mig2 d2 ON (d1.x = d2.x)
 WHERE d1.filter = 0
   AND d2.filter = 0;

It is a rather trivial join on the primary keys. Additionally each table is filtered on the filter column for one particular value. The trap works because a NESTED LOOPS join is possible in both ways. Either by filtering the first table by an INDEX RANGE SCAN on filter and then fetch the corresponding entry from the second by a primary key lookup, or vice versa. However, because we—as well as the optimizer—know that the second table is more selective than the first one, the more efficient way to execute that query is to first perform the INDEX RANGE SCAN on the second table and then join in the first one. In that way, the number of primary key lookups is reduced and the overall performance will be better. Considering that the first table suffers from heavy row migration, that effect becomes even more relevant. However, it is of course the purpose of the discussion to proof that the optimizer is doing “wrong”:

SET AUTOTRACE TRACEONLY;
SET TIMING ON;

SELECT /* original clustering factor */ *
  FROM row_mig1 d1
  JOIN row_mig2 d2 ON (d1.x = d2.x)
 WHERE d1.filter = 0
   AND d2.filter = 0;

SET TIMING OFF
SET AUTOTRACE OFF

And the result is:

577 rows selected.

Elapsed: 00:01:08.35

Execution Plan
----------------------------------------------------------
Plan hash value: 4162018446

---------------------------------------------------------------------
| Id | Operation                     | Name         | Rows  | Cost  |
---------------------------------------------------------------------
|  0 | SELECT STATEMENT              |              |  5882 | 10941 |
|  1 |  NESTED LOOPS                 |              |       |       |
|  2 |   NESTED LOOPS                |              |  5882 | 10941 |
|  3 |    TABLE ACCESS BY INDEX ROWID| ROW_MIG1     | 10000 |   938 |
|* 4 |     INDEX RANGE SCAN          | ROW_MIG1_IDX | 10000 |    20 |
|* 5 |    INDEX UNIQUE SCAN          | ROW_MIG2_PK  |     1 |     0 |
|* 6 |   TABLE ACCESS BY INDEX ROWID | ROW_MIG2     |     1 |     1 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D1"."FILTER"=0)
   5 - access("D1"."X"="D2"."X")
   6 - filter("D2"."FILTER"=0)

Statistics
----------------------------------------------------------
        913  recursive calls
          0  db block gets
      35584  consistent gets
      21027  physical reads
          0  redo size
      39012  bytes sent via SQL*Net to client
        837  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
        577  rows processed

The optimizer has chosen to perform the INDEX RANGE SCAN on ROW_MIG1_IDX first. The optimizer is well aware of the fact that the INDEX RANGE SCAN will return about 10000 rows; still it was preferred over the alternative execution plan.

So let’s check what happens if we tell the optimizer the truth about that table’s indexes?

BEGIN
  DBMS_STATS.SET_INDEX_STATS(null, 'ROW_MIG1_PK', clstfct=>100000);
  DBMS_STATS.SET_INDEX_STATS(null, 'ROW_MIG1_IDX',clstfct=>100000);
END;
/

SET AUTOTRACE TRACEONLY;
SET TIMING ON;

SELECT /* updated clustering factor */ *
  FROM row_mig1 d1
  JOIN row_mig2 d2 ON (d1.x = d2.x)
 WHERE d1.filter = 0
   AND d2.filter = 0;

SET TIMING OFF
SET AUTOTRACE OFF

The only change is that the statistics have been manually updated to the “more correct” clustering factor of 100.000. Unfortunately neither DBMS_STATS nor ANALYZE TABLE can be used for that purpose, so I did it manually. Please note that the table itself was not changed; most of the rows are still migrated.

And the result is:

577 rows selected.

Elapsed: 00:00:59.91

Execution Plan
----------------------------------------------------------
Plan hash value: 3004301745

---------------------------------------------------------------------
| Id | Operation                     | Name         | Rows  | Cost  |
---------------------------------------------------------------------
|  0 | SELECT STATEMENT              |              |  5882 | 11780 |
|  1 |  NESTED LOOPS                 |              |       |       |
|  2 |   NESTED LOOPS                |              |  5882 | 11780 |
|  3 |    TABLE ACCESS BY INDEX ROWID| ROW_MIG2     |  5882 |  5896 |
|* 4 |     INDEX RANGE SCAN          | ROW_MIG2_IDX |  5882 |    12 |
|* 5 |    INDEX UNIQUE SCAN          | ROW_MIG1_PK  |     1 |     0 |
|* 6 |   TABLE ACCESS BY INDEX ROWID | ROW_MIG1     |     1 |     1 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D2"."FILTER"=0)
   5 - access("D1"."X"="D2"."X")
   6 - filter("D1"."FILTER"=0)

Statistics
----------------------------------------------------------
        913  recursive calls
          0  db block gets
      20695  consistent gets
      12817  physical reads
          0  redo size
      39012  bytes sent via SQL*Net to client
        837  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
        577  rows processed

The more representative clustering factor makes the optimizer take the expected plan. The filtering takes place on the more selective table first, which matches about 5900 rows. The other table is joined in later. The execution is about 13% faster, the logical and physical gets dropped by about 40% both. That makes quite a difference.

The cost of the TABLE ACCESS BY INDEX ROWID that follows the INDEX RANGE SCAN reflects the clustering factor’s impact. The second query plan has a cost of about 5900, that actually means that each fetched row will need a block read. The original execution plan had a cost value of 938 for that step, so that the overall cost value was lower.

After that I must remind the reader that the rows in table ROW_MIG1 are still migrated. The performance difference is not cause by the row migration per se, but by the misleading statistics that result out of the row migration.

Correcting the Row Migration

To complete the exercise, I will correct the row migration, run the statement again, and compare the performance improvement:

SQL> ALTER TABLE row_mig1 MOVE;

Table altered.

SQL> ALTER INDEX row_mig1_pk REBUILD;

Index altered.

SQL> ALTER INDEX row_mig1_idx REBUILD;

Index altered.

SQL> BEGIN
       DBMS_STATS.GATHER_TABLE_STATS(null, 'ROW_MIG1',CASCADE=>TRUE);
     END;
     /

PL/SQL procedure successfully completed.

SQL> SELECT i.index_name         index_name
          , t.blocks             table_blocks
          , i.clustering_factor  index_clustering_factor
          , i.num_rows           index_rows
       FROM user_indexes i
       JOIN user_tables t USING (table_name)
      WHERE table_name = 'ROW_MIG1';

INDEX_NAME     TABLE_BLOCKS INDEX_CLUSTERING_FACTOR INDEX_ROWS
-------------- ------------ ----------------------- ----------
ROW_MIG1_PK          100506                  100000     100000
ROW_MIG1_IDX         100506                  100000     100000

SQL> 

Then execute the statement again:

SET AUTOTRACE TRACEONLY;
SET TIMING ON;

SELECT /* no row migration */ *
  FROM row_mig1 d1
  JOIN row_mig2 d2 ON (d1.x = d2.x)
 WHERE d1.filter = 0
   AND d2.filter = 0;

SET TIMING OFF
SET AUTOTRACE OFF

Because the statistics are almost identical, the plan doesn’t change, nor does the cost. What does changed is the execution time as well as the number of logical gets:

577 rows selected.

Elapsed: 00:00:30.90

Execution Plan
----------------------------------------------------------
Plan hash value: 3004301745

---------------------------------------------------------------------
| Id | Operation                     | Name         | Rows  | Cost  |
---------------------------------------------------------------------
|  0 | SELECT STATEMENT              |              |  5882 | 11780 |
|  1 |  NESTED LOOPS                 |              |       |       |
|  2 |   NESTED LOOPS                |              |  5882 | 11780 |
|  3 |    TABLE ACCESS BY INDEX ROWID| ROW_MIG2     |  5882 |  5896 |
|* 4 |     INDEX RANGE SCAN          | ROW_MIG2_IDX |  5882 |    12 |
|* 5 |    INDEX UNIQUE SCAN          | ROW_MIG1_PK  |     1 |     0 |
|* 6 |   TABLE ACCESS BY INDEX ROWID | ROW_MIG1     |     1 |     1 |
---------------------------------------------------------------------            

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D2"."FILTER"=0)
   5 - access("D1"."X"="D2"."X")
   6 - filter("D1"."FILTER"=0)

Statistics
----------------------------------------------------------
        925  recursive calls
          0  db block gets
      14271  consistent gets
      11952  physical reads
          0  redo size
      39012  bytes sent via SQL*Net to client
        837  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
         13  sorts (memory)
          0  sorts (disk)
        577  rows processed

Summary

The article investigates the effects of row migration on the clustering factor and the optimizer. A “proof of concept” SQL demonstrates that row migration can affect the optimizer. The lessons from this article are:

  • The “insert empty, update everything” pattern can lead to a very high row migration rate.
  • DBMS_STATS doesn’t populate the CHAIN_CNT.
  • A high row migration rate can cut the clustering factor, even below its theoretic minimum.
  • A wrong clustering factor can affect the optimizer and result in an suboptimal plan.

Disclaimer

Please note that this trap was intentionally built, just to proof that row migration can potentially influence the optimizer. There are better ways to tune that particular SQL, foremost a better indexing approach.

I have put many adjectives in quotation marks because they are not in line with the technical definition of the respective noun.

The “insert empty, update everything“ anti-pattern was used to create a very high row migration rate. Although that anti-pattern can lead to 100% migration rate, it does not always.

I have successfully verified my results on Oracle 10gR1, 11gR1 and 11gR2.

Thanks

Thanks to the guys at 25th-floor who verified my results on Oracle 11gR1.


V$SESSION_LONGOPS – Where’s My SQL Statement

March 9, 2010

A question recently appeared on the OTN forums that stated the following:

I have a query joining 4 tables and it returns 48 million. I have used /+parallel(tablename,4) */ hint with the select clause. I want to the 48 million returned by the query to another table t2.

I do get the select result in 20 minutes. when I am performing insert, I checked gv$session_longops. I usually check like select * from gv$session_longops where time_remaining>0.

It didnt return anything. I removed the where clause and rather gave the sid.(i found the sql_id from gv$session)

select * from gv$session_longops where sql_id=’XXXXXXX’;

I got 3 rows but time_remaining column is 0. Could someone explain why this is so.

OS: Linux
DB:11.1.0.6.0 – 11g
Rac- 2 node RAC

So what is happening, why does GV$SESSION_LONGOPS show 3 rows for the SQL statement, all with the TIME_REMAINING column set to 0?  This is a good question, but I do not know if there is a straight-forward answer for the question.  Possibly one of the first questions to ask is: What is the SQL statement, and what does its execution plan look like?

It probably makes sense to check the documentation for the GV$SESSION_LONGOPS view (or more accurately, the documentation for the single instance V$SESSION_LONGOPS view):

“V$SESSION_LONGOPS displays the status of various operations that run for longer than 6 seconds (in absolute time). These operations currently include many backup and recovery functions, statistics gathering, and query execution, and more operations are added for every Oracle release.

To monitor query execution progress, you must be using the cost-based optimizer and you must:

  • Set the TIMED_STATISTICS or SQL_TRACE parameters to true
  • Gather statistics for your objects with the ANALYZE statement or the DBMS_STATS package”

From the above, having the TIMED_STATISTICS parameter set to FALSE or failing to gather the statistics for the objects using DBMS_STATS are two possible reasons for the TIME_REMAINING column being set to 0.  Any other ideas?  Where is that SQL statement and execution plan when you need it?  Why ask for the execution plan?  From the documentation link you will notice that the view definition includes the SQL_PLAN_LINE_ID column, which refers to a specific line in the execution plan for the query.  From this bit of information, we just need to define what is considered an operation (and does it have to be an uninterrupted operation) – without parallel query we might be able to perform a couple of tests.  The definition of an operation and the 6 second rule might be very important.

We will use tables T3 and T4 from yesterday’s blog article since the original poster did not provide a SQL statement to reproduce the problem.  Consider the following query:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY')
  AND T3.C2=T4.C2;

If we set the STATISTICS_LEVEL parameter to ALL at the session level, run the SQL statement, and then generate the DBMS_XPLAN for the last execution, we may see an execution plan that looks something like this:

SQL_ID  2d4f5x92axqgn, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY')    AND T3.C2=T4.C2

Plan hash value: 1396201636

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    874K|00:03:31.20 |    3771K|   3763K|       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    849K|    874K|00:03:31.20 |    3771K|   3763K|    33M|  5591K|   50M (0)|
|*  2 |   TABLE ACCESS FULL| T3   |      1 |    802K|    795K|00:02:32.69 |    2743K|   2743K|       |       |          |
|   3 |   TABLE ACCESS FULL| T4   |      1 |     25M|     25M|00:00:35.93 |    1028K|   1020K|       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   2 - filter((TRUNC(INTERNAL_FUNCTION("C2"))>=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TRUNC(INTERNAL_FUNCTION("C2"))<=TO_DATE(' 2009-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

From the above, we are able to see that Oracle is reporting that the full table scan of table T3 required roughly 152.69 seconds and the full table scan of table T4 required roughly 35.93 seconds.  Assume that in another session we executed the following SQL statement roughly every 6 seconds while the above SQL statement was in the process of being executed:

SELECT
  SQL_ID,
  SQL_PLAN_HASH_VALUE HASH_VALUE,
  SQL_PLAN_LINE_ID LINE_ID,
  OPNAME,
  TARGET,
  TARGET_DESC,
  TIME_REMAINING,
  ELAPSED_SECONDS,
  SID,
  SERIAL#,
  USERNAME
FROM
  V$SESSION_LONGOPS
WHERE
  TIME_REMAINING>0;

The output of the above SQL statement might look something like the following:

SQL_ID        HASH_VALUE    LINE_ID OPNAME          TARGET          TARGET_DESC  TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL#  USERNAME
------------- ---------- ---------- --------------- --------------- ------------ -------------- --------------- ---------- ----------  --------
2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                             104              53        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              92              62        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              82              73        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              74              81        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              59              95        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              52             101        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              41             112        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              33             120        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              23             131        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                              14             140        159        909  TESTUSER

2d4f5x92axqgn 1396201636          2 Table Scan      TESTUSER.T3                               5             148        159        909  TESTUSER

no rows selected

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                              57              10        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                              42              18        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                              27              31        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                              18              41        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                               6              53        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                               6              57        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                               6              63        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                               6              70        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                               6              75        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                               6              81        159        909  TESTUSER

2d4f5x92axqgn 1396201636          3 Table Scan      TESTUSER.T4                               6              86        159        909  TESTUSER

no rows selected

From the above output it appears that the full table scan of table T3 required between 148 and 154 seconds.  At the point where the full table scan of T3 completed and the first couple of seconds (probably 6) of the full table scan of T4, the SQL statement executed in the second session returned no rows - after roughly 6 seconds of reading table T4, the query of V$SESSION_LONGOPS started reporting that table T4 was being read.  From the above it appears that the read of table T4 required between 86 and 94 seconds, yet that does not agree with the DBMS_XPLAN output – it appears that in the V$SESSION_LONGOPS output the hash join operation’s time is being reported as part of this elapsed time for reading table T4.

So, we have found a third possible reason for the rows in V$SESSION_LONGOPS appearing as 0 for the OP’s SQL statement – the line in the plan had not yet consumed 6 seconds of time, or the operation had already completed.

Let’s look at one of the other SQL statements from yesterday’s blog article:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  T3.C2 BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)
  AND T3.C2=T4.C2;

This statement is equivalent to the previous SQL statement, except that it is able to use the index that exists on table T3.  The execution plan follows:

SQL_ID  539d93k50ruz3, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE    T3.C2
BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)    AND T3.C2=T4.C2

Plan hash value: 1243183227

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |      1 |        |    874K|00:00:05.19 |   85051 |       |       |          |
|   1 |  MERGE JOIN                   |           |      1 |    795K|    874K|00:00:05.19 |   85051 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T4        |      1 |    795K|    795K|00:00:01.69 |   51097 |       |       |          |
|*  3 |    INDEX RANGE SCAN           | IND_T4_C2 |      1 |    795K|    795K|00:00:00.42 |   10841 |       |       |          |
|*  4 |   SORT JOIN                   |           |    795K|    795K|    874K|00:00:02.11 |   33954 |    30M|  1977K|   26M (0)|
|   5 |    TABLE ACCESS BY INDEX ROWID| T3        |      1 |    795K|    795K|00:00:00.58 |   33954 |       |       |          |
|*  6 |     INDEX RANGE SCAN          | IND_T3_C2 |      1 |    795K|    795K|00:00:00.18 |    2114 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T4"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   4 - access("T3"."C2"="T4"."C2")
       filter("T3"."C2"="T4"."C2")
   6 - access("T3"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

The above SQL statement completed too quickly to report any time in the V$SESSION_LONGOPS view, so let’s set the ‘08-MAR-2009′ date to ‘08-MAR-2007′ and re-run the SQL statement.  For quite a bit of time the query of V$SESSION_LONGOPS only returned the following:

no rows selected

But the SQL statement was executing, why did it not appear in V$SESSION_LONGOPS – where is that link for reporting a bug on Metalink?  :-)   Or maybe, just maybe, Oracle is not performing any single operation in the execution plan continuously for more than 6 seconds (reading the index at ID 3 followed by the corresponding rows at ID 2, and then back to the index at ID 3 again).

Eventually, the following appeared after what must have been a minute or two:

SQL_ID        HASH_VALUE    LINE_ID OPNAME          TARGET          TARGET_DESC  TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- --------------- --------------- ------------ -------------- --------------- ---------- ---------- --------
f3scvt222bu0t 1243183227          4 Sort Output                                            1194              12        159        909 TESTUSER

f3scvt222bu0t 1243183227          4 Sort Output                                            2115              22        159        909 TESTUSER

f3scvt222bu0t 1243183227          4 Sort Output                                            2115              22        159        909 TESTUSER
...
(in the above the last row continued to repeat with ELAPSED_SECONDS displayed as 22 for a minute or two)

Ready, aim,… hey – what is the TARGET.  The LINE_ID column (the SQL_PLAN_LINE_ID column in V$SESSION_LONGOPS) reported that Oracle is executing ID 4 of the execution plan.  I can’t explain why the last line remained in the V$SESSION_LONGOPS output for a minute or two (maybe someone else knows why).

So, we have found a fourth possible reason for the rows in V$SESSION_LONGOPS appearing as 0 for the OP’s SQL statement – it could be that Oracle was alternating between an index range scan and the retrieval of the matching rows from the corresponding table.

Let’s try executing the first query with a parallel hint to see what happens to the V$SESSION_LONGOPS output:

SELECT /*+ PARALLEL (T3, 4) PARALLEL (T4, 4) */
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY')
  AND T3.C2=T4.C2;

The DBMS_XPLAN output for the query follows:

SQL_ID  dywt9v0xuvgyv, child number 0
-------------------------------------
SELECT /*+ PARALLEL (T3, 4) PARALLEL (T4, 4) */    T3.C1,    T3.C2,   
T4.C3  FROM    T3,    T4  WHERE    TRUNC(T3.C2) BETWEEN
TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY')    AND T3.C2=T4.C2

Plan hash value: 1800244878

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |    874K|00:03:49.60 |      14 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |    874K|00:03:49.60 |      14 |       |       |          |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |      0 |    849K|      0 |00:00:00.01 |       0 |       |       |          |
|*  3 |    HASH JOIN            |          |      0 |    849K|      0 |00:00:00.01 |       0 |   126M|    10M|   49M (0)|
|   4 |     PX RECEIVE          |          |      0 |    802K|      0 |00:00:00.01 |       0 |       |       |          |
|   5 |      PX SEND BROADCAST  | :TQ10000 |      0 |    802K|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      0 |    802K|      0 |00:00:00.01 |       0 |       |       |          |
|*  7 |        TABLE ACCESS FULL| T3       |      0 |    802K|      0 |00:00:00.01 |       0 |       |       |          |
|   8 |     PX BLOCK ITERATOR   |          |      0 |     25M|      0 |00:00:00.01 |       0 |       |       |          |
|*  9 |      TABLE ACCESS FULL  | T4       |      0 |     25M|      0 |00:00:00.01 |       0 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T3"."C2"="T4"."C2")
   7 - access(:Z>=:Z AND :Z<=:Z)
       filter((TRUNC(INTERNAL_FUNCTION("C2"))>=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TRUNC(INTERNAL_FUNCTION("C2"))<=TO_DATE(' 2009-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
   9 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T4"."C2"))

A bloom filter, just in time for Spring (it must be true, it is in the Predicate Information section of the actual execution plan).

After roughly a 6 second delay, V$SESSION_LONGOPS was queried roughly every 6 seconds.  The output of the V$SESSION_LONGOPS query follows:

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              8               7        553        243 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              4               7         80        544 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                            114               7          7        251 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                             14               7        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              2              16        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              4               9        553        243 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              1              13         80        544 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               9          7        251 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                             13               9         80        544 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              2              12        553        243 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7              10          7        251 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              6               7        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               7        553        243 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              4              10         80        544 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              2              13        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              4               9        553        243 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              6              10          7        251 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                             10               8         80        544 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              2              12        553        243 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              1              16         80        544 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              3              10          7        251 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              6              10        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              6               8        553        243 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              8               8         80        544 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              3              12        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              4              10        553        243 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              5              10         80        544 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              5               8          7        251 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               9        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               9        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               9        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               9        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               9        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               9        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              7               9        159        982 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                              2              57         80        544 TESTUSER
dywt9v0xuvgyv 1800244878          7 Rowid Range Scan TESTUSER.T3                             20              66        159        982 TESTUSER

SQL> /

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                              9               8         81       1204 TESTUSER
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                              4               7        395        631 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                             69              13        160         32 TESTUSER
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                             12              16         81       1204 TESTUSER
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                              5              15        395        631 TESTUSER
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                             71              13        238         31 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                             11              25         81       1204 TESTUSER
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                              4              24        395        631 TESTUSER
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                             58              22        238         31 TESTUSER

SQL> /

SQL_ID        HASH_VALUE    LINE_ID OPNAME           TARGET          TARGET_DESC TIME_REMAINING ELAPSED_SECONDS        SID    SERIAL# USERNAME
------------- ---------- ---------- ---------------- --------------- ----------- -------------- --------------- ---------- ---------- --------
dywt9v0xuvgyv 1800244878          9 Rowid Range Scan TESTUSER.T4                              7              33         81       1204 TESTUSER

SQL> /

no rows selected

As you can see from the above, the parallel query processes (each with a different SID) appear, and disappear from the V$SESSION_LONGOPS output as the query executes.  The output shows a roughly 40 to 50 second time period during the execution of the query in the first session where nothing appeared in the V$SESSION_LONGOPS output.

So, we have found a fifth possible reason for the rows in V$SESSION_LONGOPS appearing as 0 for the OP’s SQL statement – the OP may have just queried V$SESSION_LONGOPS at the wrong time.

Any other possible reasons why the OP would see 0 for the TIME_REMAINING column when querying V$SESSION_LONGOPS during the execution of a SQL statement that is expected to require 20 minutes to complete?


On “Is a computer science degree a good goal?”

Dan Fink's "Is a computer science degree a good goal?" has gotten my wheels going. I think it's important to note this:
Computer Science ≠ Information Technology
Not only are these two disciplines not equal, neither is a subset of the other.

One of my most memorable culture shocks coming out of school into the Oracle domain was how many people didn't understand the difference between computer science, which is a specialized branch of mathematics, and information technology, which is a specialized branch of business administration. They both deal with computers (the IT major more than the CS one, actually), so of course there's risk that people will miss the distinction.

Over dinner Friday night with some of my friends from Percona, we touched on one of the problems. It's difficult for a technical major in school to explain even to his family and friends back home what he's studying. I remember saying once during my senior year as a math major, "I haven't seen a number bigger than 1 since I was a sophomore." I heard a new one tonight: "I got to the level where the only numbers in my math books were the page numbers."

It's difficult for people who don't study computer science to understand who you are or how the min/max kd-trees and deterministic finite automata and predicate calculus and closures that you're studying are different from the COBOL and SQL and MTBFs and ITIL that the IT majors are studying. It's easy to see why laypeople don't understand how these sets of topics arrange into distinctly different categories. What continually surprises me is how often even IT specialists don't understand the distinction. I guess even the computer science graduates soften that distinction when they take jobs doing tasks (to make a living, of course) that will be automated within ten years by other computer scientist graduates.

I agree with Dan and the comments from Tim, Robyn, Noons, Gary, and David about where the IT career path is ultimately headed in the general case. What I don't believe is that the only career path for computer scientists and mathematicians is IT. It's certainly not the only career path for the ones who can actually create things.

I believe that college (by which I mean "University" in the European sense) is a place where the most valuable skill you learn is how to learn, and that, no matter what your major, as long as you work hard and apply yourself to overcoming the difficult challenges, there will be things in this world for you to do to earn your way.

I really hope that the net effect of a depressed, broken, and downward-trending IT industry is not that it further discourages kids from engaging in math and computer science studies in school. But I don't want for so many of our kids today who'll be our adults of tomorrow to become just compartmentalized, highly specialized robots with devastatingly good skills at things that nobody's really willing to pay good money for. I think that the successful human of the future will need to be able to invent, design, create, empathize, teach, see (really see), listen (not just hear), learn, adapt, and solve.

...Just exactly like the successful human of the past.

Hotsos 2010 – Day 2 – The conference begins

A 3:30 start, which gave me lots of time to work on my demos and by breakfast time at 7:30, I felt in a reasonable place (but not quite done yet) and headed down to eat ... wait for it ... some fruit! Then again, I'd eaten so little the previous day that I had to eat something.

First up was Gary Goodman opening the event and introducing Tom Kyte's keynote with :-

a) A video of Tom and Hotsos regular Patty, Disco Dancing to mark tomorrow night's Disco theme. Personally, I thought it was hilarious and I hope it gets posted online somewhere. I'm not sure everyone found it as funny, but it did the trick for me.

b) A gift of a custom set of poker chips because I believe Tom's a player (of poker, that is). Tom's keynote was about (looks for title in notes and can't find it) the mistakes we make *because* of our experience and our assumptions. It was thoroughly entertaining and kicked off with a quiz for four volunteer participants who had to answer 12 apparently simple questions very quickly. They were trick questions, of course ;-) Rather than me taking voluminous notes, I'm hoping that Tom might post the questions on his blog (hint, hint) even though they were meant to be answered quickly based on verbal questioning.

Oh, and I couldn't believe it when my phone started ringing, for someone who's normally exceptionally careful about that stuff. I had it on silent last night, missed a text from Alex G and so had taken it off silent. Clown! (In retrospect, this should have been a warning sign.)

There were tons of great examples of where clever people have gone spectacularly wrong through over-confidence. We're often wrong when we answer quickly based on our prior experience because things change. I suspect I sometimes frustrate colleagues by not giving the snappy answers they expect, but I know that Oracle stuff is often not as obvious as it seems at first sight. There was even a sighting of Martin Widlake's "Making Things Better Makes Things Worse"  but no name-check, Martin, so you're only *almost* famous ;-)

I particularly enjoyed the video I hadn't seen of Richard Feynmann talking about the uncomfortable state of confusion and feeling stupid. (With hindsight, why did I not see the warning signs ...)

Then I skipped two sessions I was looking forward to but, as I've said before, I need to take care of my own presenting business before I can enjoy and learn from other people's presentations. However, I bumped into a friend after Richard Foote's session and he thought it was amazing. My friends first trip to Hotsos is going well.

Sitting in my room, I was really happy with how the demos were looking, raring to go and went to iron my shirt. Damn, the iron wasn't working and I really didn't have time to wait for another to show up. No problem, though, because I always carry a special shirt that *really* doesn't need ironing - just hang it up in the bathroom with the shower on. (Another warning sign, though?)

I'm very superstitious about using any new kit for the first time, so I was still nervous about a couple of things with the demos

1) Performance, because I know I'd run them all weekend and have a new powerful laptop (more on that another time), but the demos are designed to hammer the machine. So I thought I'd disable everything I didn't need running.

2) Driving the projector because I'd never driven one with the new laptop. I arranged to check that my laptop would drive the projector ok. It seemed to, but one of the demos was behaving strangely. No time to investigate because Dan Norris needed the podium, so I decanted to Alex's Battle Against Any Guess presentation because we'd been discussing it, I know he was concerned about how it would go and wanted to show a little moral support.

In the event, he needn't have worried. Personally, I find his Russian accent all but impenetrable, particularly after we've had a few beers and are both shouting at each other, but I'm sure the feeling's mutual! But the fact that he had the room rolling about with laughter at regular intervals meant it had gone very well! I hope he still does plenty of technical presentations with lots of demos, but he proved he can do something a little more conceptual and message-driven. Better still, I'd been frantically trying to get my demos working at the back of the hall by re-enabling services I'd disabled and they started to work :-) I turned round to Cary Millsap who was hiding their too and dramatically mimed wiping the sweat away from my brow. Time to head back into the other room and watch the tail-end of Dan Norris' Database Machine presentation. I didn't see nearly enough and was completely pre-occupied by what was to come, so will need to catch up on the slides later.

Some interesting Q & A at the end which were slightly less interesting for me because my demos started playing up again! So, a room full of people, I'm rebooting my laptop, nothing is working and then, bang, it's time!

Whilst my presentation was originally part of this post, my review is so long-winded and personal, I've moved it to another post that people can choose to skip over! That way it doesn't detract from the good presentations I saw yesterday, such as Kevin Closson's "Ten Years After Y2K And We Still "Party Like It's 1999"".

Excellent, as always, and packed full of information, it also sort of tied up the way the day opened, albeit in a very different tone and style that things change. Kevin has the perspective of having been in the industry a long time, so he's seen the past, but also being able to see where we are now and where we're going, so he concentrated on a few of his pet subjects that I'm pretty sure you'll see cropping up on his blog in the near future but, to give you a taster, he talked about the merits of SMT on Nehalem EP chips for different workloads, which is something we'd been talking about at the opening reception. I'm pretty sure I can't disable it on my laptop to give it a try, but I might have a word with him about it anyway. He talked about Flash, Direct-attached storage, NFS, virtualisation and all that good stuff as always but because I missed the first 10 min for a much-needed break and couldn't read the slides because I forgot my glasses, I'm going to have to try to hit on him for a copy of his slides later! Oh, and I was wondering why the presentation was well attended but not packed. I'd forgotten Tom Kyte was speaking next door. I was happy with my choice.

I did have one more presentation scheduled, to see the first of Kerry Osborne's but I was pretty low at this stage and sleep deprivation was kicking in, so I headed back to my room to catch up with some blogging and tweeting (thanks for the support) and then it was time for dinner with the Oak Table crew. Very nice it was, too, hats off to Marco and Carol for choosing that place, the company of all and for Carol's consistently top-notch organisation and geek-ferrying. Nice big lump of beautiful steak, some chips and a beer. That's more like it! A few more drinks back at the bar but I made my excuses and left at 11:30 despite some vain attempts at hypnotising me into staying! ;-)

Oh, but the weather was still rubbish all day :-(

Regardless of everything that happened yesterday, I now have 3 days of no responsibility and can just learn, socialise, eat and perhaps sleep a bit more. I woke up at 3:30 again this morning, but turned over and managed another 2 hours. Bliss!

Hotsos 2010 – Day 1.79 – Friends show up

Hard at work preparing for my presentation yesterday until it got nearer to 5pm and people started coming in on their flights. First up was Kevin Closson, who looked a somewhat sad sight (by his usual standards), clutching a bag from a PC store containing what he hoped would be the crucial bit of software that would help him solve his over-written MBR. I waited for him in the bar and discovered Mark Bobac had Ubuntu Live on a USB stick. I suppose I could have gone and told Kevin, but I knew he'd be down imminently ;-)

It was also a brilliant experience to bump into Paul Matuszyk again, who I worked with briefly 4 years ago at Sky. This is his first conference, largely inspired by a set of Symposium 2006 notes I gave him and my encouragement so it was a) great to see him again and b) pleasing when I saw him coming out of Richard Foote's presentation, blown away by how good it was. I think he's already decided it was a good decision ;-)

Then it was the usual suspects, the usual drinking and some chat with KC, who always has smart stuff to say. In amongst that, I liked him pointing out that my new iCore 7 laptop is probably what they used to euphemistically call a 'Portable' computer. He's not wrong there ;-)

Not too late to bed and really looking forward to the first day of the conference and my presentation.

Inserts Experiencing an Increasing CPU Consumption

Last week I had to analyze a strange performance problem. Since the cause/solution was somehow surprising, at least for me, I thought to share it with you. Let me start by quickly describing the setup and what was done to reproduce the problem: Database version: Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 (64-bit) Operating system: Solaris 10 (SPARC) To [...]

International Women’s Day

If you do not know what International Women’s Day is: http://www.internationalwomensday.com/

Start planning your blog posts for Ada Lovelace day now (March 24th, http://findingada.com/ Ada Lovelace Day is an international day of blogging (videologging, podcasting, comic drawing etc.!) to draw attention to the achievements of women in technology and science.)

To that end, I would like to point out all the women currently in science and tech fields that I admire and think are doing great things. I think it would be great if everyone, male or female, made a list like this:

The women that have taught me science/tech along the way:

High School:
Mary Lou Ciavarra (Physics)
Maria Petretti (Pre-Algebra, and Academic Decathlon)
Reneé Fishman (Biology)
Lisa Acquaire (Economics during Academic Decathlon)

College:
Professor Kalpana White (Biology), and in whose fruit fly lab I worked for 2 semesters.
Professor Eve Marder (Introductory Neuroscience)

Though Brandeis does have female faculty in the Computer Science department, I did not manage to have any classes with female Computer Science faculty members.

My current female DBA co-workers at Pythian: Isabel Pinarci (Oracle), Michelle Gutzait (SQL Server), Catherine Chow (Oracle) and Jasmine Wen (Oracle).

And to folks in the greater MySQL/tech community and tech co-workers past and present, especially those I have been inspired and helped by: Tracy Gangwer, Selena Deckelmann (Postgres), Amy Rich, Anne Cross, and more (If I have forgotten you, I apologize!).

ORA-15477: cannot communicate with the volume driver

ORA-15477: cannot communicate with the volume driver
when add volume...
SQL> alter diskgroup data add volume 'asm_vol1' size 5G;
alter diskgroup data add volume 'asm_vol1' size 5G
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15477: cannot communicate with the volume driver

SQL> alter diskgroup data set attribute 'compatible.advm'='11.2';
alter diskgroup data set attribute 'compatible.advm'='11.2'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15242: could not set attribute compatible.advm
ORA-15238: 11.2 is not a valid value for attribute compatible.advm
ORA-15477: cannot communicate with the volume driver
# acfsutil registry -l
acfsutil registry: CLSU-00100: Operating System function: open64 (/dev/ofsctl) failed with error data: 2
acfsutil registry: CLSU-00101: Operating System error message: No such file or directory
acfsutil registry: CLSU-00103: error location: OOF_1
acfsutil registry: ACFS-00502: Failed to communicate with the ACFS driver. Verify the ACFS driver has been loaded.
then load ACFS.
# /u01/app/grid/bin/acfsload start -s
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9322: done.
Try again.
SQL> alter diskgroup data set attribute 'compatible.advm'='11.2';

Diskgroup altered.

SQL> alter diskgroup data add volume 'asm_vol1' size 5G;

Diskgroup altered.

SQL> !ls -l /dev/asm/asm_v*
brwxrwx--- 1 root oinstall 252, 86529 Mar 9 00:11 /dev/asm/asm_vol1-169
Written By: Surachart Opun http://surachartopun.com

ORA-15041 – ASM Disk Group Unbalanced

Use ASM Disk Group NORMAL redundancy (11gR2) and found error while create tablespace.
SQL> create tablespace test datafile '+DATA' size 5G;
create tablespace test datafile '+DATA' size 5G
*
ERROR at line 1:
ORA-01119: error in creating database file '+DATA'
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup "DATA" space exhausted
What Wrong??? - I have free size on "DATA", Check ASM Disk Group.
SQL> select name, total_mb, free_mb, required_mirror_free_mb, usable_file_mb ,type from v$asm_diskgroup where name='DATA';

NAME TOTAL_MB FREE_MB REQUIRED_MIRROR_FREE_MB USABLE_FILE_MB TYPE
------------------------------ ---------- ---------- ----------------------- -------------- ------
DATA 257807 112233 23437 44398 NORMAL
Check (ASM) alert log file.
kfdpDumpBg()
kfdpDumpBg() - Done
Check on ASM Instance and Investigate the problem.
SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 1017 23437 NORMAL
/dev/sdb 7867 23437 NORMAL
/dev/sdk 7988 23437 NORMAL
/dev/sdq 23211 23437 NORMAL
/dev/sdf 7864 23437 NORMAL
/dev/sdg 14819 23437 NORMAL
/dev/sdc 13827 23437 NORMAL
/dev/sdi 6971 23437 NORMAL
/dev/sde 14829 23437 NORMAL
/dev/sdj 1 23437 NORMAL
/dev/sdd 13839 23437 NORMAL
Disk Group UNBALANCE, then
SQL> alter diskgroup data rebalance power 11;

Diskgroup altered.
Check.
SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 1018 23437 NORMAL
/dev/sdb 7867 23437 NORMAL
/dev/sdk 7988 23437 NORMAL
/dev/sdq 23209 23437 NORMAL
/dev/sdf 7864 23437 NORMAL
/dev/sdg 14820 23437 NORMAL
/dev/sdc 13827 23437 NORMAL
/dev/sdi 6971 23437 NORMAL
/dev/sde 14830 23437 NORMAL
/dev/sdj 0 23437 NORMAL
/dev/sdd 13839 23437 NORMAL
Nothing to resolve... Check on v$operation.
SQL> select group_number, operation, state, error_code from v$asm_operation;

GROUP_NUMBER OPERA STAT ERROR_CODE
------------ ----- ---- --------------------------------------------
1 REBAL ERRS ORA-15041
Try to check & repair ASM Disk Group.
SQL> alter diskgroup data check all repair;

Diskgroup altered.

SQL> select group_number, operation, state, error_code from v$asm_operation;

GROUP_NUMBER OPERA STAT ERROR_CODE
------------ ----- ---- --------------------------------------------
1 REBAL ERRS ORA-15041

SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 1018 23437 NORMAL
/dev/sdb 7867 23437 NORMAL
/dev/sdk 7988 23437 NORMAL
/dev/sdq 23209 23437 NORMAL
/dev/sdf 7864 23437 NORMAL
/dev/sdg 14820 23437 NORMAL
/dev/sdc 13827 23437 NORMAL
/dev/sdi 6971 23437 NORMAL
/dev/sde 14830 23437 NORMAL
/dev/sdj 0 23437 NORMAL
/dev/sdd 13839 23437 NORMAL
Nothing to resolve... - "DATA" ASM Disk Group still unbalance, and v$asm_operation show error.
think & make Idea: Move some files or resize some files on "DATA" Disk Group, so login database and make something.
Example:
SQL> alter database tempfile '+DATA/orcl/tempfile/temp.263.712603677' resize 1G;

Database altered.
Check on ASM again.
SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 1685 23437 NORMAL
/dev/sdb 8322 23437 NORMAL
/dev/sdk 8357 23437 NORMAL
/dev/sdq 22855 23437 NORMAL
/dev/sdf 8324 23437 NORMAL
/dev/sdg 15013 23437 NORMAL
/dev/sdc 14043 23437 NORMAL
/dev/sdi 7386 23437 NORMAL
/dev/sde 15011 23437 NORMAL
/dev/sdj 711 23437 NORMAL
/dev/sdd 14054 23437 NORMAL
Check v$asm_operation.
SQL> select group_number, operation, state, power, error_code from v$asm_operation;

GROUP_NUMBER OPERA STAT ERROR_CODE
------------ ----- ---- --------------------------------------------
1 REBAL RUN
ASM instance... "DATA" Disk Group rebalancing, So Wait... and check
SQL> select group_number, operation, state, power, error_code from v$asm_operation;

no rows selected
SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 10524 23437 NORMAL
/dev/sdb 10524 23437 NORMAL
/dev/sdk 10528 23437 NORMAL
/dev/sdq 10532 23437 NORMAL
/dev/sdf 10521 23437 NORMAL
/dev/sdg 10523 23437 NORMAL
/dev/sdc 10522 23437 NORMAL
/dev/sdi 10526 23437 NORMAL
/dev/sde 10522 23437 NORMAL
/dev/sdj 10522 23437 NORMAL
/dev/sdd 10523 23437 NORMAL

SQL> alter diskgroup data rebalance power 1;

Diskgroup altered
Disks rebalanced, Try to create tablespace again.
SQL> select name, total_mb, free_mb, required_mirror_free_mb, usable_file_mb ,type from v$asm_diskgroup where name='DATA';

NAME TOTAL_MB FREE_MB REQUIRED_MIRROR_FREE_MB USABLE_FILE_MB TYPE
------------------------------ ---------- ---------- ----------------------- -------------- ------
DATA 257807 115767 12916 51425 NORMAL

SQL> create tablespace test datafile '+DATA' size 30G;

Tablespace created.

SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 4936 23437 NORMAL
/dev/sdb 4931 23437 NORMAL
/dev/sdk 4931 23437 NORMAL
/dev/sdq 4932 23437 NORMAL
/dev/sdf 4922 23437 NORMAL
/dev/sdg 4933 23437 NORMAL
/dev/sdc 4936 23437 NORMAL
/dev/sdi 4940 23437 NORMAL
/dev/sde 4933 23437 NORMAL
/dev/sdj 4928 23437 NORMAL
/dev/sdd 4938 23437 NORMAL
What this told me? Make Sure ASM Disk Group no problem (REBALANCE).
Written By: Surachart Opun http://surachartopun.com

Cloud computing definition.

Rajeev | Mar 8, 2010 08:08 +0000

The National Institute of Standards and technology has a good, concise  definition of cloud computing. Sushil kumar of Oracle, was using the same language to define cloud computing in an article the current release of the oracle magazine.

Essential Charachteristics

  • On Demand Self-Service
  • Broad Network Access
  • Resource Pooling
  • Rapid Elasticity
  • Measured Service

Service Models

  • Cloud Software as a Service (SaaS)
  • Cloud Platform as a Service (PaaS)
  • Cloud Infrastructure as a Service (IaaS)

Deployment Models

  • Private Cloud
  • Community Cloud
  • Public Cloud
  • Hybrid Cloud

Congratulations to Mr. Sabdar and Mr. Ben Prusinski

Many congratulations to my fellow Oracle ACEs,  Mr. Syed Sabdar Hussain (my brother) and Mr. Ben Prusinski (friend and co-author of my upcoming Oracle 11g RAC book) whose short interview published in Oracle Magazine's (March/April 2010 edition) peer-to-peer section.

Impact of the TRUNC Function on an Indexed Date Column

March 8, 2010

A recent email from an ERP user’s group mailing list reminded me of a small problem in that ERP program’s modules related to the DATE columns in several of its tables.  In DATE columns that should only contain a date component, rows will occasionally be inserted by one of the ERP program’s modules with a date and time component, for example ‘08-MAR-2010 13:01:13′ rather than just ‘08-MAR-2010 00:00:00′.  This bug, or feature, leads to unexpected performance issues when normal B*Tree indexes are present on that date column.  To work around the time component in the DATE type columns, the ERP program modules frequently uses a technique like this to perform a comparisons on only the date component of a DATE type columns:

SELECT
  *
FROM
  T3
WHERE
  TRUNC(DATE_COLUMN) = :d1;

In the above D1 is a bind variable.  On occasion, the ERP program will instead pass in the date value as a constant/literal rather than as a bind variable.  What is wrong with the above syntax?  Would the above syntax be considered a bug if the DATE_COLUMN column had a normal B*Tree index?  Is there a better way to retrieve the required rows?  Incidentally, I started re-reading the book “Troubleshooting Oracle Performance” and I encountered a couple of interesting sentences on page 7 that seem to address this issue:

“For all intents and purposes, an application experiencing poor performance is no worse [should probably state no better] than an application failing to fulfill its functional requirements. In both situations, the application is useless.”

Let’s try a couple of experiments to see why the above SQL statement requires improvement.

First, we will create table T2 that will serve as a nearly sequential ordered row source with a small amount of randomization introduced into the data by the DBMS_RANDOM function.  This row source will be used to help duplicate the essentially random arrival rate of transactions into our T3 test table:

CREATE TABLE T2 AS
SELECT
  DBMS_RANDOM.VALUE(0,0.55555)+ROWNUM/10000 DAYS
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000) V2;

The date column in our T3 table is derived from SYSDATE, so ideally the rows should be in order by the DAYS column in table T2.  In a production environment, on rare occasion someone will slip in a row that is not in sequential order through an edit of the DATE column for a row, so we should be able to simulate that slight randomness by creating another table from table T2 before generating table T3 (the rows will fill the table blocks with an occasional row that is out of date sequence):

CREATE TABLE T2_ORDERED NOLOGGING AS
SELECT
  DAYS
FROM
  T2
ORDER BY
  DAYS;

For our simulation, we have a final problem that needs to be addressed.  The volume of data entered on a Saturday in the production database is less than that for a Monday through Friday, and the volume of data entered on a Sunday is less than that entered on a Saturday.  To add just a little more randomness, we will insert the rows into table T3 based on the following criteria:

  • 90% chance of a row from T2_ORDERED being included if the date falls on a Monday through Friday
  • 20% chance of a row from T2_ORDERED being included if the date falls on a Saturday
  • 10% chance of a row from T2_ORDERED being included if the date falls on a Sunday

The SQL statement to build table T3 follows:

CREATE TABLE T3 NOLOGGING AS
SELECT
  DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY') C1,
  DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY') C2,
  LPAD('A',255,'A') C3
FROM
  T2_ORDERED
WHERE
  DECODE(TO_CHAR(DAYS+TO_DATE('01-JAN-1990','DD-MON-YYYY'),'D'),'1',0.9,'7',0.8,0.1)<DBMS_RANDOM.VALUE(0,1);

CREATE INDEX IND_T3_C2 ON T3(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)

ALTER TABLE T3 MODIFY C1 NOT NULL;
ALTER TABLE T3 MODIFY C2 NOT NULL;

Let’s check the data distribution:

SELECT
  COUNT(*) NUM_ROWS,
  SUM(DECODE(TO_CHAR(C1,'D'),'6',1,0)) FRIDAYS,
  SUM(DECODE(TO_CHAR(C1,'D'),'7',1,0)) SATURDAYS,
  SUM(DECODE(TO_CHAR(C1,'D'),'1',1,0)) SUNDAYS
FROM
  T3;

  NUM_ROWS    FRIDAYS  SATURDAYS    SUNDAYS
---------- ---------- ---------- ----------
68,579,287 12,858,100  2,855,164  1,428,569

From the above we are able to determine that roughly 18.7% of the rows have a date that is on a Friday, roughly 4.2% of the rows have a date that is on a Saturday, and 2.1% of the rows are on a Sunday.

This test will be performed on Oracle Database 11.2.0.1 with the __DB_CACHE_SIZE hidden parameter floating at roughly 7,381,975,040 (6.875GB).  I will use my toy project for performance tuning to submit the SQL statements and retrieve the DBMS_XPLAN output, but the same could be accomplished with just SQL*Plus (most tests can also be performed using my Automated DBMS_XPLAN tool).

Let’s start simple, we will start with a simple SQL statement to retrieve the rows with today’s date (March 8, 2010) using literals against the indexed column.  I will execute each SQL statement twice to take advantage of any previously cached blocks in the buffer cache, and eliminate the time consumed by the hard parse:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY');

SQL_ID  3us49wsdzdun3, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')

Plan hash value: 4161002650

---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   9114 |00:02:31.93 |    2743K|   2743K|
|*  1 |  TABLE ACCESS FULL| T3   |      1 |   9114 |   9114 |00:02:31.93 |    2743K|   2743K|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TRUNC(INTERNAL_FUNCTION("C2"))=TO_DATE(' 2010-03-08 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

Roughly 2 minutes and 32 seconds.  Notice the Note at the bottom of the DBMS_XPLAN output, cardinality feedback (apparently not documented) is a new feature in Oracle Database 11.2.0.1 (see here for a related blog article).  The first execution required 2 minutes and 33 seconds, but a predicted cardinality of 685,000 rows (1% of the total) was returned for the first execution.  The second execution generated a second child cursor with a corrected cardinality estimate based on the actual number of rows returned during the first execution.  2 minutes and 32 seconds is not bad, unless this is an OLTP application and an end user is waiting for the application to return the rows.

Let’s try again with a modified, equivalent SQL statement, again executing the SQL statement twice:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2 < TO_DATE('08-MAR-2010','DD-MON-YYYY')+1;

SQL_ID  c7jfpa0rpt95a, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >=
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2 <
TO_DATE('08-MAR-2010','DD-MON-YYYY')+1

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6859 |   9114 |00:00:00.02 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6859 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

0.02 seconds compared to 2 minutes and 32 seconds.  You will notice that the estimated number of rows, while not exact, is reasonably close even without a cardinality feedback adjustment.  Also notice that the optimizer adjusted the date calculation that was in the WHERE clause of the SQL statement.

Let’s try again with a second modified, equivalent SQL statement, again executing the SQL statement twice:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

SQL_ID  7xthpspukrbtv, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 BETWEEN
TO_DATE('08-MAR-2010','DD-MON-YYYY')      AND
TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6860 |   9114 |00:00:00.02 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6860 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<=TO_DATE(' 2010-03-08 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

0.02 seconds again, and the estimated number of rows is roughly the same as we achieved with the previous SQL statement.  By checking the Predicate Information section of the DBMS_XPLAN output we see that the optimizer has transformed the BETWEEN syntax into roughly the same syntax as was used in the previous SQL statement.

Let’s try again with bind variables (the bind variable names are automatically changed by ADO into generic names, and that is why the bind variable appears in the execution plan as :1 rather than :d1):

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = :d1;

SQL_ID  cub25jm7y8zck, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) = :1

Plan hash value: 4161002650

---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |   9114 |00:02:33.37 |    2743K|   2743K|
|*  1 |  TABLE ACCESS FULL| T3   |      1 |    685K|   9114 |00:02:33.37 |    2743K|   2743K|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(TRUNC(INTERNAL_FUNCTION("C2"))=:1)

Notice that our friendly note about Cardinality Feedback did not appear this time, and that the cardinality estimate was not corrected when the SQL statement was executed for the second time, even though bind variable peeking did happen.  The incorrect cardinality estimate would not have changed the execution plan for this SQL statement, but could impact the execution plan if table T3 were joined to another table.

Let’s try the other equivalent SQL statement with bind variables:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= :d1
  AND C2 < :d2 +1;

SQL_ID  9j2a54zbzb9cz, child number 0
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >= :1    AND C2 <
:2 +1

Plan hash value: 3025660695

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |   9114 |00:00:00.02 |     575 |
|*  1 |  FILTER                      |           |      1 |        |   9114 |00:00:00.02 |     575 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6859 |   9114 |00:00:00.02 |     575 |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6859 |   9114 |00:00:00.01 |     118 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:1<:2+1)
   3 - access("C2">=:1 AND "C2"<:2+1)

The optimizer estimated that 6,859 rows would be returned, just as it did when we used literals in the SQL statement  because of bind variable peeking.  In case you are wondering, the same estimated row cardinality was returned when the D2 bind variable was set to ‘09-MAR-2010′ in the application and the +1 was removed from the SQL statement.

Quite the problem we caused by pretending to not understand the impact of using a function in the WHERE clause on an indexed column.  We could create a function based index to work around the problem of the application programmers not knowing how to specify a specific date without using the TRUNC function on a DATE column:

CREATE INDEX IND_T3_C2_FBI ON T3(TRUNC(C2));

ALTER SYSTEM FLUSH SHARED_POOL;

But is creating a function based index the best approach, or have we just created another problem rather than attacking the root cause of the original problem?  We now have two indexes on the same column that need to be updated every time a row is inserted or deleted in table T3, and also maintained every time that column is updated (even when updated with the same value).  Let’s experiment with the function based index.

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY');

SQL_ID  3us49wsdzdun3, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')

Plan hash value: 3662266936

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |      1 |        |   9114 |00:00:00.01 |     576 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3            |      1 |   9114 |   9114 |00:00:00.01 |     576 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   9114 |   9114 |00:00:00.01 |     119 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

Cardinality feedback again helps out the cardinality estimate on the second execution, but look at the Predicate Information section of the execution plan.  We have now increased the difficulty of walking through a complicated execution plan with the help of the Predicate Information section of the execution plan to see how the predicates in the WHERE clause are applied to the execution plan.  Not so bad, right?  What happens if this column C2 is joined to a column in another table, or even specified as being equal to column C1 in this table?  Let’s take a look:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  C2 >= TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2 < TO_DATE('08-MAR-2010','DD-MON-YYYY')+1
  AND C2=C1;

SQL_ID  27rqhg1mpmzt9, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    C2 >=
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2 <
TO_DATE('08-MAR-2010','DD-MON-YYYY')+1    AND C2=C1

Plan hash value: 4176467757

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |   9114 |00:00:00.01 |     575 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T3        |      1 |   9114 |   9114 |00:00:00.01 |     575 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2 |      1 |   9114 |   9114 |00:00:00.01 |     118 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("C2"="C1" AND "C1">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "C1"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
   2 - access("C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C2"<TO_DATE(' 2010-03-09 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

On the first execution the E-Rows column for plan ID 1 showed that the cardinality estimate was 1 row, and on the second execution the cardinality estimate was corrected to 9114.  Notice that transitive closure took place – the filter operation on plan ID 1 shows the same restrictions for column C1 as had applied to column C2 in the WHERE clause.

Let’s try again with the SQL statement using the TRUNC function – this SQL statement will use the function based index:

SELECT
  C1,
  C2,
  C3
FROM
  T3
WHERE
  TRUNC(C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND C2=C1;

SQL_ID  ftu92j3z99ppr, child number 1
-------------------------------------
SELECT    C1,    C2,    C3  FROM    T3  WHERE    TRUNC(C2) =
TO_DATE('08-MAR-2010','DD-MON-YYYY')    AND C2=C1

Plan hash value: 3662266936

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |      1 |        |   9114 |00:00:00.01 |     576 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T3            |      1 |   9114 |   9114 |00:00:00.01 |     576 |
|*  2 |   INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   9114 |   9114 |00:00:00.01 |     119 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"="C1")
   2 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - cardinality feedback used for this statement

The cardinality estimate is again correct because of cardinality feedback, but notice what is missing from the Predicate Information section of the execution plan (transitive closure did not happen).

So, does the use of TRUNC(DATE_COLUMN) without the presence of a function based index qualify as an application bug?  What if a the function based index is present – is it still a bug?

Something possibly interesting, but unrelated.  I executed the following commands:

ALTER INDEX IND_T3_C2_FBI UNUSABLE;

(perform a little more testing)

ALTER INDEX IND_T3_C2_FBI REBUILD;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)

I received the following after several minutes:

BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE);
END;

*
ERROR at line 1:
ORA-00600: internal error code, arguments: [15851], [3], [2], [1], [1], [], [],
[], [], [], [], []
ORA-06512: at "SYS.DBMS_STATS", line 20337
ORA-06512: at "SYS.DBMS_STATS", line 20360
ORA-06512: at line 1

The same error appeared without the CASCADE option, but a call to collect statistics on the indexes for the table, as well as other tables completes successfully.  I may look at this problem again later.

Continuing, we will create another table:

CREATE TABLE T4 NOLOGGING AS
SELECT
  *
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('01-JAN-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

CREATE INDEX IND_T4_C2 ON T4(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

Before we start, let’s take a look at the disk space used by the objects and the automatically allocated extent sizes:

SELECT
  SEGMENT_NAME SEGMENT,
  (SUM(BYTES))/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T3_C2','IND_T3_C2_FBI','T3','T4','IND_T4_C2')
GROUP BY
  SEGMENT_NAME
ORDER BY
  SEGMENT_NAME;

SEGMENT           TOTAL_MB
--------------- ----------
IND_T3_C2             1469
IND_T3_C2_FBI         1472
IND_T4_C2               10
T3                   21480
T4                     144  

SELECT
  SEGMENT_NAME SEGMENT,
  COUNT(*) EXTENTS,
  BYTES/1024 EXT_SIZE_KB,
  (COUNT(*) * BYTES)/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T3_C2','IND_T3_C2_FBI','T3','T4','IND_T4_C2')
GROUP BY
  SEGMENT_NAME,
  BYTES
ORDER BY
  SEGMENT_NAME,
  BYTES;

SEGMENT            EXTENTS EXT_SIZE_KB   TOTAL_MB
--------------- ---------- ----------- ----------
IND_T3_C2               16          64          1
IND_T3_C2               63        1024         63
IND_T3_C2              120        8192        960
IND_T3_C2                1       27648         27
IND_T3_C2                1       34816         34
IND_T3_C2                6       65536        384
IND_T3_C2_FBI           16          64          1
IND_T3_C2_FBI           63        1024         63
IND_T3_C2_FBI          120        8192        960
IND_T3_C2_FBI            7       65536        448
IND_T4_C2               16          64          1
IND_T4_C2                9        1024          9
T3                      16          64          1
T3                      63        1024         63
T3                     120        8192        960
T3                       1       19456         19
T3                       1       43008         42
T3                       1       44032         43
T3                     318       65536      20352
T4                      16          64          1
T4                      63        1024         63
T4                      10        8192         80

Table T3 is using about 21GB of space while table T4 is using about 144MB of space.  We occasionally received an extent size that is not a power of 2 in size – a bit unexpected.  Let’s try a couple of SQL statements that access the two tables:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  f2v7cf7w2bwqq, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')     AND T3.C2=T4.C2

Plan hash value: 1631978485

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |  10044 |00:00:00.38 |   18622 |     25 |       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |   7095 |  10044 |00:00:00.38 |   18622 |     25 |  1223K|  1223K| 1593K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |   6857 |   9114 |00:00:00.04 |     394 |     25 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   6857 |   9114 |00:00:00.03 |      28 |     25 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |    452K|    452K|00:00:00.12 |   18228 |      0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The above used a full table scan on table T4, and you will notice that a filter predicate is not applied to table T4 to reduce the number of rows entering the hash join.  Transitive closure did not take place.  Let’s try again with the SQL statement with the other syntax that does not use the TRUNC function, nor the function based index:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  T3.C2 BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)
  AND T3.C2=T4.C2;

SQL_ID  5swqbjak147vk, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE    T3.C2
BETWEEN TO_DATE('08-MAR-2010','DD-MON-YYYY')       AND
TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60)    AND T3.C2=T4.C2

Plan hash value: 3991319422

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |  10044 |00:00:00.06 |     983 |      1 |       |       |          |
|*  1 |  HASH JOIN                   |           |      1 |   6761 |  10044 |00:00:00.06 |     983 |      1 |  1223K|  1223K| 1618K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |   6860 |   9114 |00:00:00.01 |     393 |      0 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |   6860 |   9114 |00:00:00.01 |      27 |      0 |       |       |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T4        |      1 |   6762 |   9114 |00:00:00.03 |     590 |      1 |       |       |          |
|*  5 |    INDEX RANGE SCAN          | IND_T4_C2 |      1 |   6762 |   9114 |00:00:00.01 |     127 |      1 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2010-03-08
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   5 - access("T4"."C2">=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2010-03-08
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

Notice this time that transitive closure happened, allowing the optimizer to take advantage of the IND_T4_C2 index on table T4.

You are probably thinking, it must be that we need a function based index on the C2 column of table T4 also to allow transitive closure to happen.  Let’s try:

CREATE INDEX IND_T4_C2_FBI ON T4(TRUNC(C2));

ALTER SYSTEM FLUSH SHARED_POOL;

Now our SQL statement again:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  f2v7cf7w2bwqq, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) = TO_DATE('08-MAR-2010','DD-MON-YYYY')     AND T3.C2=T4.C2

Plan hash value: 1631978485

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |  10044 |00:00:00.33 |   18622 |       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |   7095 |  10044 |00:00:00.33 |   18622 |  1223K|  1223K| 1584K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |   6857 |   9114 |00:00:00.01 |     394 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |   6857 |   9114 |00:00:00.01 |      28 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |    452K|    452K|00:00:00.11 |   18228 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$"=TO_DATE(' 2010-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

As expected, the function based index on column C4 of table T4 was not used because transitive closure did not happen.  Do we still want to do it the wrong way?  The execution time could have been much longer than 0.33 seconds, of course, if table T4 were much larger and a large number of physical reads were required.  Try again using a larger table T4:

DROP TABLE T4 PURGE;

CREATE TABLE T4 NOLOGGING AS
SELECT
  *
FROM
  T3
WHERE
  C2 BETWEEN TO_DATE('01-JAN-2000','DD-MON-YYYY')
    AND TO_DATE('08-MAR-2010','DD-MON-YYYY') + (1-1/24/60/60);

CREATE INDEX IND_T4_C2 ON T4(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)

Table T4 now requires about 7.9GB of disk space.  Now a range scan that accesses tables T3 and T4 (each SQL statement is executed twice, with the last execution plan reported):

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY')
  AND T3.C2=T4.C2;

SQL_ID  2d4f5x92axqgn, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY')    AND T3.C2=T4.C2

Plan hash value: 1631978485

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |    874K|00:00:33.24 |    1062K|    302K|       |       |          |
|*  1 |  HASH JOIN                   |               |      1 |    849K|    874K|00:00:33.24 |    1062K|    302K|    33M|  5591K|   50M (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3            |      1 |    802K|    795K|00:00:00.56 |   33957 |      0 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2_FBI |      1 |    802K|    795K|00:00:00.20 |    2115 |      0 |       |       |          |
|   4 |   TABLE ACCESS FULL          | T4            |      1 |     25M|     25M|00:00:17.13 |    1028K|    302K|       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."SYS_NC00004$">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."SYS_NC00004$"<=TO_DATE('
              2009-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Notice the full table scan of table T4.  Now the other SQL statement:

SELECT
  T3.C1,
  T3.C2,
  T4.C3
FROM
  T3,
  T4
WHERE
  T3.C2 BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)
  AND T3.C2=T4.C2;

SQL_ID  539d93k50ruz3, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE    T3.C2
BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)    AND T3.C2=T4.C2

Plan hash value: 1243183227

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |      1 |        |    874K|00:00:05.80 |   85051 |    574 |       |       |          |
|   1 |  MERGE JOIN                   |           |      1 |    795K|    874K|00:00:05.80 |   85051 |    574 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T4        |      1 |    795K|    795K|00:00:02.43 |   51097 |    574 |       |       |          |
|*  3 |    INDEX RANGE SCAN           | IND_T4_C2 |      1 |    795K|    795K|00:00:00.41 |   10841 |      0 |       |       |          |
|*  4 |   SORT JOIN                   |           |    795K|    795K|    874K|00:00:02.00 |   33954 |      0 |    30M|  1977K|   26M (0)|
|   5 |    TABLE ACCESS BY INDEX ROWID| T3        |      1 |    795K|    795K|00:00:00.50 |   33954 |      0 |       |       |          |
|*  6 |     INDEX RANGE SCAN          | IND_T3_C2 |      1 |    795K|    795K|00:00:00.17 |    2114 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T4"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   4 - access("T3"."C2"="T4"."C2")
       filter("T3"."C2"="T4"."C2")
   6 - access("T3"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

Notice that the above used the index on table T4, but performed a sort-merge join between the two tables.  We are able to force a hash join, as was used with the other SQL statement, by applying a hint:

SQL_ID  b9q6tf6p6x2m0, child number 0
-------------------------------------
SELECT /*+ USE_HASH (T3 T4) */    T3.C1,    T3.C2,    T4.C3  FROM   
T3,    T4  WHERE    T3.C2 BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY') 
    AND TO_DATE('01-JUL-2009','DD-MON-YYYY') + (1-1/24/60/60)    AND
T3.C2=T4.C2

Plan hash value: 3991319422

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    874K|00:00:03.60 |   85051 |       |       |          |
|*  1 |  HASH JOIN                   |           |      1 |    795K|    874K|00:00:03.60 |   85051 |    33M|  5591K|   50M (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID| T3        |      1 |    795K|    795K|00:00:00.54 |   33954 |       |       |          |
|*  3 |    INDEX RANGE SCAN          | IND_T3_C2 |      1 |    795K|    795K|00:00:00.19 |    2114 |       |       |          |
|   4 |   TABLE ACCESS BY INDEX ROWID| T4        |      1 |    795K|    795K|00:00:01.44 |   51097 |       |       |          |
|*  5 |    INDEX RANGE SCAN          | IND_T4_C2 |      1 |    795K|    795K|00:00:00.40 |   10841 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   3 - access("T3"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T3"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))
   5 - access("T4"."C2">=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T4"."C2"<=TO_DATE(' 2009-07-01
              23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

9.2 times faster (just 5.7 times faster without the hint) by not using the TRUNC function and function-based index combination.  Are we able to just agree to do it the right way, or should I continue?  Without the function based index we receive an execution plan like this:

SQL_ID  2d4f5x92axqgn, child number 0
-------------------------------------
SELECT    T3.C1,    T3.C2,    T4.C3  FROM    T3,    T4  WHERE   
TRUNC(T3.C2) BETWEEN TO_DATE('08-MAR-2009','DD-MON-YYYY')      AND
TO_DATE('01-JUL-2009','DD-MON-YYYY')    AND T3.C2=T4.C2

Plan hash value: 1396201636

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |    874K|00:03:05.88 |    3771K|   3013K|       |       |          |
|*  1 |  HASH JOIN         |      |      1 |    849K|    874K|00:03:05.88 |    3771K|   3013K|    33M|  5591K|   52M (0)|
|*  2 |   TABLE ACCESS FULL| T3   |      1 |    802K|    795K|00:02:34.36 |    2743K|   2743K|       |       |          |
|   3 |   TABLE ACCESS FULL| T4   |      1 |     25M|     25M|00:00:15.72 |    1028K|    270K|       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."C2"="T4"."C2")
   2 - filter((TRUNC(INTERNAL_FUNCTION("C2"))>=TO_DATE(' 2009-03-08 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TRUNC(INTERNAL_FUNCTION("C2"))<=TO_DATE(' 2009-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

A full table scan of a 21GB table and 7.9GB table, with 795,000 rows from the large table and 25,000,000 rows from the small table entering the hash join – probably not too good for performance.  Fix the performance bug in the application and let the user get back to counting the pencils in the pencil jar 9.2 times faster (or 51.6 times faster if there is no function based index).

While you might not frequently join two tables on a DATE column as I have done in this demonstration, how common is it to store numeric data in a VARCHAR2 column, and then need to be able to compare those values with numbers stored in NUMBER type columns, with literals, or numeric bind variables?


Interview: Spotlight on Tuning With Guy Harrison

Guy Harrison is the author of the newly released Oracle Performance Survival Guide, MySql Stored Procedure Programming (with Steven Feuerstein, 2006), Oracle SQL High Performance Tuning (2001), and Oracle Desk Reference (1999). He is currently a director of development at Quest Software, where he created the popular Spotlight family of products. He leads a software [...]

ORA-02050 Remote DB in Doubt? How About Gone for Years??

Interesting Issue… I was working on an interrupted distributed, (remote) transaction that I’d come across in one of our smaller reporting systems and attempting to rerun since 2008 against a decommissioned database, (yes, I said 2008…) This is a very “sublime” issue, as Oracle thinks it’s just taking care of a problem behind the scenes, but, like I said, this database no longer existed and was taken down while this remote process was running. The problem consisted of three databases, the reporting database, we’ll call REPRT_1, the current production warehouse, PROD_1 and the older production base, PROD_X


Oracle keeps submitting a new transaction in an attempt to complete this circle when it can’t find the PROD_X remote database, last follow through on 2/28/2010 and rolling back for quite some time using the PROD_2, in REPRT_1.

Here’s the pending transaction:
LOCAL_TRAN_ID GLOBAL_ORACLE_ID               STATE        RECO_TIME
------------------------------------------------------------------------------------------
6.114.203                 PROD_X.41a46747.6.114.203     collecting     2/28/2010

When did it orginally fail?
FAIL_TIME
8/26/2008


How do we trace this back?

Here’s an example of a failure from the log:
Error 3113 trapped in 2PC on transaction 1.120.13876. Cleaning up.
Error stack returned to user:
ORA-02050: transaction 1.120.13876 rolled back, some remote DBs may be in-doubt
ORA-03113: end-of-file on communication channel
ORA-02063: preceding line from PROD_1

Fri Mar 05 09:21:48 2010
DISTRIB TRAN REPRT_1.764783ad.1.120.13876
is local tran 1.120.13876 (hex=01.78.3634)
insert pending collecting tran, scn=72519859225 (hex=10.e2853c19)
If you look at the entry below, it matches the RECO_TIME in the select for the pending transaction above.

Sun Feb 28 17:42:39 2010
Errors in file /u01/app/oracle……/reprt_1_reco_20549.trc:
ORA-02019: connection description for remote database not found
From the reprt_1_reco_20549.trc trace file:
ERROR, tran=6.114.203, session#=1, ose=0:
ORA-02019: connection description for remote database not found
*** 2010-02-09 19:36:07.388
ERROR, tran=6.114.203, session#=1, ose=0:
ORA-02019: connection description for remote database not found

In this log, you will see every entry for the transactions that have tried to complete the original one, including 1.120.13876 shown in the example from the alert log… This completed to the commited one that you see at the end.

Since the original transaction and database it used to connect to is no longer available, there was a challenge, even after attempting traditional means of dealing with the issue, (i.e. standard attempt to FORCE COMMIT, FORCE ROLLBACK or exec dbms_transaction.purge_lost_db_entry('6.114.203'), enter manually into dba_trans, etc…) all failed as the status of the transaction was “collecting” and such.
Example:
force commit '1.108.13875';

ORIGINAL TRANSACTION FROM DBA_2PC_PENDING:

LOCAL_TRAN_ID  GLOBAL_ORACLE_ID FAIL_TIME
RECO_TIME
--------- ----------------------------------------------------------------
6.114.203     REPRT_1.41a46747.6.114.203      26-AUG-08  <--yes, that says 2008!!
28-FEB-10

At this point, I’ve come to the conclusion that since Metalink and googling hasn’t given me any resolutions, it’s time to take the information I have gained from these documents and start thinking up my own resolution. DBA’s are entering data into the tables, is it possible to address the collecting issue in the existing entry? 

I'm quite close, but can't seem to set the transaction beyond collecting due to this odd collection of circumstances and seem to be caught in a loop...hmmm...

Step 1:
Is the actual transaction "recognized" in the KT* tables at all???
SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID *
KTUXESTA Status,
KTUXECFL Flags
FROM x$ktuxe
AND ktuxeusn= 6;


Nope, confirmed- 6.114.203 doesn’t seem to exist, so after retaining the current information from dba_2pc_pending, I hold my breath and go on the attack- (Note- this is a last resort step when this type of situation occurs, remote database has been decommissioned for a long time!!)

Step 2:
First, make it stop trying to recover the transaction-
ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;

Second, update the status from collecting to committed:
update sys.dba_2pc_pending
set status='committed'
where status='collecting'
and local_tran_id='6.114.203'


commit;

exec dbms_transaction.purge_lost_db_entry('6.114.203');

SQL> PL/SQL procedure successfully completed.
Step 3:
Now that it’s committed, let it clean it up!
ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;

Step 4:
Review that ever growing trace file again!
Now I see at the end of the trace file, /u01/app/oracle……/reprt_1_reco_20549.trc:
*** 2010-03-05 11:59:12.699
is local tran 8.101.13783 (hex=08.65.35d7))
delete pending committed tran, scn=72630478954 (hex=10.e91d286a)

Finally!

Now we just need to check and verify the rows disappear in each of these tables as the distributed recovery recognizes that the process is no longer collecting and does it’s job!

SELECT * from dba_2pc_pending;
SELECT * from dba_2pc_neighbors;
select * from sys.pending_trans$;
select * from sys.pending_sessions$;

The above should all be empty in a short amount of time…
Monitor your alert log, verify that you have resolved the issue successfully and in my case, the problem was resolved.

It Was a Dark and Stormy Night (A Story About Deadlocks)

It was a dark and stormy night; a coven of database administrators was gathered at the local watering hole, trading “big fish” stories (interlaced, like a concoction of couscous sprinkled with parsley, with the occasional “How I Trashed the Database” story) and quaffing the generous libations poured by the master of the house. Suddenly the frantic [...]