Hotsos 2010 – Monique



There she goes, squirreling into the corner of your carry-on.  Ready for another adventure.  Tucked between a fun floppy hat and the flourescent sun-block, of which she uses plenty, pale creature that she is, Monique is situated to prove to you that a seasoned traveller knows best.  Dressed in a curly mohair (so as not to show the wrinkles that betray one new to the jet set) and sporting a pale chiffon scarf - to double as a disquise in those cities where she is already known for her somewhat silly escapades - she seems to wink at you and beckon you along.  Forget your cares, but by no means your playing cards, and step lightly into whatever dreams your next voyage should hold for you:  let Monique be your guide.  

Her very presence alleviates all the maladies brought on by the jostling of trains, the dipping of ships, the swerving of buses and even by the simple fact that you are far from home.  For she has discovered the secret to successful travel; with the exception of the luscious few minutes in the X-ray machine, where she delights in making funny faces at the operators searching for more dangerous contents.  Monique never looks back.

Charleen Kinser Designs

(With thanks to Carol Dacko for bringing along an interesting dinner companion for my own cuddly friends. Somehow I doubt they'll ever be the same - she's terribly sophisticated for my lot!)

Hotsos 2010 – What’s THAT?

I heard someone bemoaning the lack of Swag at Hotsos - just an event program, couple of magazines and a small clockwork toy sponsored by Oracle. I don't think I ever come to the Symposium expecting Swag - it's just not that kind of event. But you do tend to get good speaker gifts, including the best speaker gift I ever received. This year, it's a digital photo frame.

As for that clockwork toy, the Cuddly Toys think it's *brilliant*!!! At first, their reaction was "What's THAT?!" then when I wound it up and let it dance in front of them, they fell in love.



The only thing is that it doesn't run for long per wind-up, so my right wrist is aching from them continually beggind me to "Do it again, Douglas!"

Then again, their reaction wasn't quite as relaxed as when they met Carol Dacko's friend while we were at the restaurant. Apologies for the quality of the photo, it was dark in there, but at least it proves that Little H and Chris were allowed out for dinner and that not all Cuddly Toys are made the same.



They were just stunned by their new friend. What's THAT?

All will be revealed in the next post ....

Hotsos 2010 – Day 4

First up was Cary Millsap's - Lessons Learned, Version 2010.03 As Cary pointed out, they always try to put the best speakers in the toughest slots - 8:30 in the morning post-party. I think local guys are slightly more reliable too because they might have actually gone home the night before! He started with a quick Hangover Survey (me - check!) and then pressed on talking about how we test system performance.

He showed a video of Boeing stress-testing the wing of the 787 and, as he pointed out, aircraft manufacturers really know how to stress-test! (Of course whether that reassures you as it does me, or makes you wish no-one would talk about wings disintegrating, as it probably would Mads, is personal.) They showed Boeing test equpiment which is complicated, expensive and non-revenue generating. Those tests are expensive but when people's lives are on the line, what choice is there? Boeing knows that it has to test the analytic models used in the design. He spent a lot of time talking about good test design. A few thoughts that stood out to me ...

- Some stress tests are a waste of time. Will the Boeing 787 land on the moon? If this test fails, what has it proven? If it passes, then it's awesome but it would be a very expensive way to prove it can cope with commercial flights in Earth's atmosphere.

- Why test for more than you will see in Production? Because you don't really know for sure what you'll see in Production.

- At some point, but I can't remember the context, he used a Scottish phrase that he'd heard Billy Connolly shout (although the Big Yin was only fully credited later in the day) ...

    "There's no such thing as bad weather, just the wrong clothes"

... looked over at me and said - "I'd love to hear you say that, with the proper accent". I declined politely.

- Most people try to prove only that their systems will work.

- Most tests of systems that are destined to fail never proved it in advance.

- Test to destruction

    a) Test
    b) Until the system melts
    c) Decide whether your real requirements are likely to be lower or higher than melting point.

There was a small amount of time for questions and once it looked like they were done, I granted Cary's wish (never thought I'd say that), stuck my hand up and repeated The Big Yin's words. It was only after the laughter had stopped that I realised I might have ruined his big closing, but I think he was ok about it ;-)

Next was Tanel Poder talking about LGWR, log file sync waits and COMMIT performance and shock, horror, I was actually going to say that this was one of the least rewarding sessions of the week for me. What?!? Tanel? But he's, like, an Oracle God! LOL But there were reasons

- I realise that I know a *lot* about how log file sync and log file parallel write work, how they relate to each other and some of the problems they might help you identify. Because it's a subject I'm *so* familiar with, I didn't learn much.

- His main demo didn't quite show what he wanted it to because it didn't run multiple sessions but, frankly, I'm in no position to talk about demos this week!

By the end, the presentation turned out ok, not least because there was another unexpected appearance from Bob Sneed to talk about the I/O components involved in  redo log management including a suggestion that LGWR be put into a higher scheduling class (but not Real Time!) I'll try to find a link to his slides and let you take a look yourself.

I loved Tanel's Big Log File Sync Tuning Secret, though ...

    COMMIT LESS!

It was particularly relevant to me because I had a Big Log File Sync Tuning Secret as the closing moment of my own presentation. The problem was I couldn't use it after the demos went wrong!

    USE ASYNCHRONOUS COMMITS

But, in my case, that was supposed to be funny, too.

I ran off to try and use the free breakfast voucher that Marco had given me but I was just too late. No food again, then :-( Well, I had a couple of slices of cold meat at lunchtime, but mainly to catch up with Alex G before he had to present and then head back to Ottawa. I managed to skip one session at this stage but, after a quick call home, I decided to go along to Alex's RAC Connection Management presentation after all (a little late). Although I have seen some of this stuff before, I always enjoy watching Alex's demos and was particularly impressed by the fact that he'd managed to write his own RAC connection load balancer! I was waiting for the applause in the room but either people didn't quite get it or there was just a lack of energy post-lunch on the last day. I suspect the latter.

Of course, once I'd said goodbye to Alex properly (don't see him nearly enough), I was a little late for whichever session was going to be my final one of the conference and I was hopelessly torn between Kyle Hailey's modern SQL performance tools presentation (Kyle's done a lot of cool work in the area of Oracle Performance Visualisation) and Chris Antognini's Diagnosing Parallel Executions Performance. In the end I plumped for the latter because I thought it was going to be like something I'd unsuccessfully attempted a couple of years ago and I wanted to see if Chris had a different angle on it and had been more successful. In the end, I probably made the wrong choice because although Chris' presentation was great, it was really all stuff I already knew. Definitely my bad call, though. Hopefully I'll get a chance to catch up with Kyle's presentation at some point in the future too!

After that there was just the usual short farewell and thanks from Gary Goodman of Hotsos. Although the thanks were appreciated, I'm glad they were spread around everybody because the attendees are one of the things that make this conference great and Becky and Rhonda did their usual sterling job of organising everything.

Then it was time for some Fajitas with a few friends (actually, a whopping great number of friends who practically filled the Mexican restaurant!) and a few very sedate beers. (We are old men (and women) now and the night before was a big one!) While we were waiting to go to the Mexican, I had one great surprise left - Alex's flights weren't going to get him home, so he came back from the airport and had to check in overnight! At least I got a chance to talk to him properly when I wasn't hopelessly drunk and didn't try to seduce him this time.

Now I need to stop blogging and get back to listening to Tanel's Training Day (good stuff, too, but more about that later)

Hotsos 2010 – Day 3 – An excellent one (part 2)

In the end my work call fell through so I had the unexpected opportunity to see Marco Gralike's XML presentation. Despite the fact that he was scheduled opposite Tanel Poder talking about performance fundamentals, which impacted the number of attendees, I thought it was terrific. I had my own brief flirtation with XML around 10 years ago but, contrary to my expectations, there was lots for me to learn here and it was interesting to see how much more mature Oracle's offering is these days. I suppose 10 years is a long time, but a lot has happened while my back was turned.

The slides were excellent, the pace was good (although he did run over a little) and I could actually understand what he was saying for a change ;-) I walked away very impressed and when I was talking to a couple of his attendees at the bar later, it seems I wasn't the only one.

Another unexpected bonus was that Henry Poras had to cancel for personal reasons and although I was extremely disappointed by that because we share interests, replacements don't come much better than former Sun luminary Bob Sneed, who is now an independent consultant, available for hire.

His presentation was about one of his favourite topics, CPU Quality of Service. Rather than just measuring how much CPU is available or being used, we need to drill down into how it is being used. Even a 100% busy system does have available CPU really when you think about it, because if we could make our application more efficient, that would release CPU or perhaps we're exceeding our SLAs and could reduce CPU consumption and still reach our SLA targets?

But, most of all, it's about the quality of the resource delivered to applications that need it. He walked through a number of case studies of 4x to 16x system performance improvements, some as simple as changing scheduling strategies and some down to bug fixes or application architecture tweaks to improve Cycles Per Instruction (CPI) figures. Chip architectures seem complex these days so although the O/S thinks the CPU is busy, who knows what it's busy doing?

My final presentation of the day was Kerry Osborne's Scripts. I enjoyed his presentation as much as I enjoy his blog posts. He's a very down-to-earth and modest guy but clearly has shed-loads of practical experience and a great way of communicating it. He was bothered a little by a slightly slow network connection but, frankly, I don't know what he was whining about. That man has no idea what a demo problem is! ;-) I really liked his attitude towards his scripts, too - take them, use and abuse them, knock yourself out! Well, his scripts have helped me many times over the past year. A good guy.

After that, I was just about ready for a beer so adjourned to the bar with Marco hoping to catch up with Kerry and buy him a beer I'd promised him. It was 4:30 at this stage and I lazily decided to skip the last session. The usual daily tiredness was starting to kick in which wasn't helping but, in retrospect, curing it with alcohol probably wasn't the smartest strategy! Carol Dacko and Kevin Closson showed up and it was good to get another chat with them because I don't get to see either of them often. Kerry turned up eventually and, although I may have bought him that promised drink, I'd have to admit that he did all the leg-work by going to the bar and bringing a large round of drinks back. Saved me a trip ;-)

Slowly it began to dawn on me that my planned trip back to my room to freshen up and change before the party was in jeapordy as the clock ticked around until 7:30 and I noticed Kyle Hailey's other half unbuttoning his shirt to the navel in preparation for Disco Night.  (Check the start time above - 3 hours drinking *before* the party and, as usual, no food.) It's little wonder that I was quite as drunk as I was, unusually so, believe me. Although I think I just about managed to behave myself, it was a close-run thing. As I pointed out to Alex and Marco, you always know when a Scotsman is truly drunk because he starts declaring  his undying love for all and sundry. I think at one point I was actually trying to seduce Alex, but he was having none of it! Oh, my god, I think I was dancing for a few minutes!

Time to draw a discrete veil over the day, I reckon.

Hotsos 2010 – Day 3 – An excellent one (part 1)

Well, that was a nice start to the day! Someone came up to me just before Wolfgang Breitling's presentation to point out that something from my presentation had helped him fix a problem at work last night. It was the OEM Raw Data drill-down that shows you the underlying ASH data for sessions, including backgrounds. He'd used it to identify the timed event leading up to a problem with a crashing smon process. Actually, the more people I speak to, the more I get over the presentation. Most people recognised I made the best of a bad situation but I'm glad that someone actually learned something, too!

Wolfgang's "Anatomy of a SQL Tuning Session" was one that I managed to miss at UKOUG and I'm glad I made it this time. It revolved around taking a single SQL statement that took 9 minutes to execute and walking through various tuning iterations, using modern tools and optimiser possibilities, leading to a sub-second execution time. It was a natural extension of Wolfgang's Tuning by Cardinality Feedback that I've often recommended to colleagues. He covered a variety of techniques including :-

- Converting parts of the statement to scalar sub-queries
- Subquery factoring
- Transitive closure, with a useful tip that it may be worth experimenting with specifying apparently redundant join predicates to give the optimiser more information to work with.
- Using the Outline part of 10g DBMS_XPLAN to identify the set of hints that would create a specific plan and then using some of them yourself (but this is far easier if you name your query blocks).

What I particularly liked about this presentation was the way that Wolfgang illustrated execution plan steps and changes with some nice slides, highlighting a few steps at a time. But it made me feel a lot better about my mobile phone going off the previous day when Wolfgang's *own* mobile went off during this one ;-)

Next up was Neil Gunther with "How to Quantify Oracle Scalability - Part 1", a presentation about applying his Universal Scalability Law (USL) to Oracle systems. There were quite a few high-level points I picked up from this.

People often assume that the purpose of a model is purely to predict the future but it's probably just as (or more) useful as a method of validating test results because in his view 'Data comes from the devil and models come from god'. I'm not sure I agree with that. Test results, even incorrect results, represent reality to me that can't just be explained away by Maths which doesn't agree with them but I suspect that's my peculiar perspective. I much preferred the suggestion that we use models and data together because, whilst he might not trust my data, maybe I don't trust his model yet?

He talked about how the USL allows for the phenomenon of reduced throughput as workload increases which is something I think I've seen before by adding Coherency to Amdahl's Law. The USL is definitely worth more investigation. As a non-mathematician, though, I suspect I always struggle with this stuff.

Which was why I was *so* relieved that I made the tough decision to skip Riyaj's presentation and stayed for the second part of this two-hander by Peter Stadler - "How to Quantify Oracle Scalability - Part 2". This was a more practical examination of the USL in relation to Oracle systems and as someone who is very interested in performance in general and the relationship between Response Time and Throughput in particular, this hit the spot. What was slightly bizarre, though was when I recognised the URL for this blog post and the test results come up on screen. I think I'm right to say that this is the second consecutive Hotsos Symposium where this one post has been discussed (by Cary Millsap last year) so I must be doing something right ;-)

Peter spent the next 20 minutes or so talking about some of the comments on the post and plugging the results into the USL. I must admit to being slightly surprised by the fact that Peter didn't think to drop me a mail to let me know he was going to talk about it so much because I might have missed it and it was fascinating! He talked about the lack of detail in the results, but that was because the blog post had an extremely simple message - are you looking for High Throughput, Low Response Times or both? Regardless, if he'd asked me, I could have given him some more information to work with. For example, there was some discussion about measurement errors in data and performing multiple runs to address that which is something I did, but only published one set of fairly representative results.

So it was all a bit strange and unexpected, but utterly fascinating to see someone apply a mathematical approach to my empirical results. I hope that Peter might post the slides and add a URL to the blog post so that everyone can share what he found. I think that's the point of the comments thread and of blogging in general - sharing information and knowledge and building a discussion.

Next I managed to eat a little bit of much-needed lunch with Paul Matuszyk and then had to get ready for my important work call.

Hotsos 2010 – Congratulations, Marco!

You managed to capture a couple of minutes of my presentation when there was a picture on the screen!


Hotsos 2010 – My Presentation

I really don't know how to blog about this because every time I feel I'm honestly self-critical, everyone thinks I'm close to suicide or something. I like to think I notice both the good and the bad but am probably more likely to speak openly about the bad. If you ask me it's a Scottish thing about not getting above yourself or blowing your own trumpet too loudly. Or maybe it's just that the only way you can improve is by noticing the bad stuff and fixing it? Someone once said to me "You can't be an insecure overachiever without first being insecure" ;-) and they aren't Scottish, so I shouldn't generalise. Updated later - it was Cary Millsap who came up with this line. I loved it when he mentioned it a few months ago and still do.

In this case I'll try to be even-handed and finish with the positives!

Negatives

The whole point of this presentation is that it's about 50 minutes of demonstrations and about 10 of slides. For the first 30 minutes, the demonstrations would not work. If you think that's a success, there's something the matter with you and you should probably never give any presentations!

This is the killer. In retrospect I know what the problem was and have fixed it previously with the help of others but did not do the same on the new laptop! As I moved in and out of wireless range, the lack of a network connection at the Windows end completely threw VMWare and my demos. But, believe me, when your demos have been working for a couple of weeks, you don't have long until your presentation and they stop working intermittently, it's difficult to be cool and analytical. I was so distracted by other things that might go wrong, I missed one and, having missed it, I wasn't cool enough to recognise the symptoms. Fortunately, an old Hotsos friend in the room came up with the goods in the form of a little Sprint wireless box that got me hooked up. I kissed him, but there were no tongues involved.

With only 30 minutes left, there was very little I could show and it completely ruined the whole flow of the presentation, which I'd worked so hard to get right and which I know can be terrific, because I've done similar presentations before and had been looking forward to doing the best version yet. People took the time to come and see it, I let them down and I'm sorry about that. Nobody is likely to change my view on that.

This presentation is a big deal to me, never mind anyone else. I work hard on these things, try to cover all the angles, take time off work and all because I like teaching people new stuff. When I take a week off to attend a conference, but also to present, it's a bit of a blow when your one shot fails. If you're not passionate about your presentations (and I somehow doubt anyone isn't) again, you shouldn't be presenting.

Positives

Sh*t happens when you get involved with computers. Yes, folks, I know that, but that's also why you play around with demos for a long time to minimise the possibility that it will!

This might not sound like a positive and it's dangerously close to sounding like an excuse, but the fact is that I've had 2 out of 3 presentations go very wrong recently. One of the consistent factors in this is that I switched to VMWare because I had to to run 11gR2 on Windows. It's not the same as blaming VMWare to say that I've been having to deal with stuff I haven't for a long time. When everyone was talking about VMWare as the way to go for presentations, I remember thinking 'I don't know, just seems like more moving parts that might break to me.' and I kept reading blog posts about demos being broken and then fixed just in time, all of which were on some virtualisation platform or other, but didn't have the confidence to say something. All I know is, say what you like about Windows, but I've hardly ever had a problem in multiple 2-day course teaches of performance and OEM stuff! Still, it's down to me to get on top of what are some simple issues.

Listen, I know I can present. No false modesty round here. So, to wrap this up on a positive note, I'm well aware that there are few of the people who I see present who could have managed to get through that first half hour, make people laugh, keep thinking about the problem and manage to get a short demo of Swingbench into the bargain. I think I also managed to salvage something out of the last half of the presentation without completely collapsing into a heap. Sure, I was a bit brain-addled by then, but I would have liked to have seen how others might have coped ;-) Frankly, I kept waiting for the room to empty (I've seen it happen) but the vast majority stuck around to the end. Maybe they were sadists! LOL

I am not and will never manage to be happy with that presentation but lessons have been learned, it's just a presentation and there'll be lots of others. I know that.

Postive Solution 1 - If anyone wants to try to grab me while I'm at the conference, I'll show you the screens and demos. They're very cool ;-)

Positive Solution 2 - I think Alex Gorbachev might try to arrange for me to repeat the presentation properly as a webinar. I've already done this at my current customer site once (on the 10g stuff) and it went reasonably well. I'd also ... get this ... asked Marco to video it for me yesterday with his snazzy mini-setup because I thought I might post a few bits online if they were particularly good so those that can't make it to conferences could get a taste of it. Actually, it appeals to my cold, self-deprecating sense of humour to post some of it online soon. I promise I won't make it too self-flaggelating though!

P.S. For Paul Vallee. Paul it is not all good :-)

Hotsos 2010 – Day 2 – The conference begins

A 3:30 start, which gave me lots of time to work on my demos and by breakfast time at 7:30, I felt in a reasonable place (but not quite done yet) and headed down to eat ... wait for it ... some fruit! Then again, I'd eaten so little the previous day that I had to eat something.

First up was Gary Goodman opening the event and introducing Tom Kyte's keynote with :-

a) A video of Tom and Hotsos regular Patty, Disco Dancing to mark tomorrow night's Disco theme. Personally, I thought it was hilarious and I hope it gets posted online somewhere. I'm not sure everyone found it as funny, but it did the trick for me.

b) A gift of a custom set of poker chips because I believe Tom's a player (of poker, that is). Tom's keynote was about (looks for title in notes and can't find it) the mistakes we make *because* of our experience and our assumptions. It was thoroughly entertaining and kicked off with a quiz for four volunteer participants who had to answer 12 apparently simple questions very quickly. They were trick questions, of course ;-) Rather than me taking voluminous notes, I'm hoping that Tom might post the questions on his blog (hint, hint) even though they were meant to be answered quickly based on verbal questioning.

Oh, and I couldn't believe it when my phone started ringing, for someone who's normally exceptionally careful about that stuff. I had it on silent last night, missed a text from Alex G and so had taken it off silent. Clown! (In retrospect, this should have been a warning sign.)

There were tons of great examples of where clever people have gone spectacularly wrong through over-confidence. We're often wrong when we answer quickly based on our prior experience because things change. I suspect I sometimes frustrate colleagues by not giving the snappy answers they expect, but I know that Oracle stuff is often not as obvious as it seems at first sight. There was even a sighting of Martin Widlake's "Making Things Better Makes Things Worse"  but no name-check, Martin, so you're only *almost* famous ;-)

I particularly enjoyed the video I hadn't seen of Richard Feynmann talking about the uncomfortable state of confusion and feeling stupid. (With hindsight, why did I not see the warning signs ...)

Then I skipped two sessions I was looking forward to but, as I've said before, I need to take care of my own presenting business before I can enjoy and learn from other people's presentations. However, I bumped into a friend after Richard Foote's session and he thought it was amazing. My friends first trip to Hotsos is going well.

Sitting in my room, I was really happy with how the demos were looking, raring to go and went to iron my shirt. Damn, the iron wasn't working and I really didn't have time to wait for another to show up. No problem, though, because I always carry a special shirt that *really* doesn't need ironing - just hang it up in the bathroom with the shower on. (Another warning sign, though?)

I'm very superstitious about using any new kit for the first time, so I was still nervous about a couple of things with the demos

1) Performance, because I know I'd run them all weekend and have a new powerful laptop (more on that another time), but the demos are designed to hammer the machine. So I thought I'd disable everything I didn't need running.

2) Driving the projector because I'd never driven one with the new laptop. I arranged to check that my laptop would drive the projector ok. It seemed to, but one of the demos was behaving strangely. No time to investigate because Dan Norris needed the podium, so I decanted to Alex's Battle Against Any Guess presentation because we'd been discussing it, I know he was concerned about how it would go and wanted to show a little moral support.

In the event, he needn't have worried. Personally, I find his Russian accent all but impenetrable, particularly after we've had a few beers and are both shouting at each other, but I'm sure the feeling's mutual! But the fact that he had the room rolling about with laughter at regular intervals meant it had gone very well! I hope he still does plenty of technical presentations with lots of demos, but he proved he can do something a little more conceptual and message-driven. Better still, I'd been frantically trying to get my demos working at the back of the hall by re-enabling services I'd disabled and they started to work :-) I turned round to Cary Millsap who was hiding their too and dramatically mimed wiping the sweat away from my brow. Time to head back into the other room and watch the tail-end of Dan Norris' Database Machine presentation. I didn't see nearly enough and was completely pre-occupied by what was to come, so will need to catch up on the slides later.

Some interesting Q & A at the end which were slightly less interesting for me because my demos started playing up again! So, a room full of people, I'm rebooting my laptop, nothing is working and then, bang, it's time!

Whilst my presentation was originally part of this post, my review is so long-winded and personal, I've moved it to another post that people can choose to skip over! That way it doesn't detract from the good presentations I saw yesterday, such as Kevin Closson's "Ten Years After Y2K And We Still "Party Like It's 1999"".

Excellent, as always, and packed full of information, it also sort of tied up the way the day opened, albeit in a very different tone and style that things change. Kevin has the perspective of having been in the industry a long time, so he's seen the past, but also being able to see where we are now and where we're going, so he concentrated on a few of his pet subjects that I'm pretty sure you'll see cropping up on his blog in the near future but, to give you a taster, he talked about the merits of SMT on Nehalem EP chips for different workloads, which is something we'd been talking about at the opening reception. I'm pretty sure I can't disable it on my laptop to give it a try, but I might have a word with him about it anyway. He talked about Flash, Direct-attached storage, NFS, virtualisation and all that good stuff as always but because I missed the first 10 min for a much-needed break and couldn't read the slides because I forgot my glasses, I'm going to have to try to hit on him for a copy of his slides later! Oh, and I was wondering why the presentation was well attended but not packed. I'd forgotten Tom Kyte was speaking next door. I was happy with my choice.

I did have one more presentation scheduled, to see the first of Kerry Osborne's but I was pretty low at this stage and sleep deprivation was kicking in, so I headed back to my room to catch up with some blogging and tweeting (thanks for the support) and then it was time for dinner with the Oak Table crew. Very nice it was, too, hats off to Marco and Carol for choosing that place, the company of all and for Carol's consistently top-notch organisation and geek-ferrying. Nice big lump of beautiful steak, some chips and a beer. That's more like it! A few more drinks back at the bar but I made my excuses and left at 11:30 despite some vain attempts at hypnotising me into staying! ;-)

Oh, but the weather was still rubbish all day :-(

Regardless of everything that happened yesterday, I now have 3 days of no responsibility and can just learn, socialise, eat and perhaps sleep a bit more. I woke up at 3:30 again this morning, but turned over and managed another 2 hours. Bliss!

Hotsos 2010 – Day 1.79 – Friends show up

Hard at work preparing for my presentation yesterday until it got nearer to 5pm and people started coming in on their flights. First up was Kevin Closson, who looked a somewhat sad sight (by his usual standards), clutching a bag from a PC store containing what he hoped would be the crucial bit of software that would help him solve his over-written MBR. I waited for him in the bar and discovered Mark Bobac had Ubuntu Live on a USB stick. I suppose I could have gone and told Kevin, but I knew he'd be down imminently ;-)

It was also a brilliant experience to bump into Paul Matuszyk again, who I worked with briefly 4 years ago at Sky. This is his first conference, largely inspired by a set of Symposium 2006 notes I gave him and my encouragement so it was a) great to see him again and b) pleasing when I saw him coming out of Richard Foote's presentation, blown away by how good it was. I think he's already decided it was a good decision ;-)

Then it was the usual suspects, the usual drinking and some chat with KC, who always has smart stuff to say. In amongst that, I liked him pointing out that my new iCore 7 laptop is probably what they used to euphemistically call a 'Portable' computer. He's not wrong there ;-)

Not too late to bed and really looking forward to the first day of the conference and my presentation.

Hotsos 2010 – Day 1.34 – Did someone Scottish show up?



It'd better be better tomorrow, after all the crowing I've done over the past few weeks about the prospective weather!

Hotsos 2010 – My Agenda

Let's see how well I can stick to this

Sunday

18:00 - Registration and Reception
21:00 - Bed

Monday

09:45 - Tom Kyte: All About Metadata: Why Telling the Database about Your Schema Matters   
11:00 - Richard Foote: Oracle Indexing Myths
12:00 - Lunch (make sure new laptop works with projector, get changed and start panicking a bit more)
13:00 - Dan Norris: Consolidation Strategies for Oracle Database Machine (would love to see Alex, but this one's relevant to work at the moment)
14:15 - Me: How I Learned to Love Pictures — Oracle 10g/11g Performance Analysis Using OEM
15:30 - Kevin Closson: Ten Years after Y2K, and We Still "Party like It's 1999"
16:45 - Kerry Osborne: Controlling Execution Plans (without Touching the Code)
Evening - Oak Table Dinner

Tuesday

08:30 -Wolfgang Breitling: Anatomy of a SQL Tuning Session
09:45 - Neil Gunther/Peter Stalder: How to Quantify Oracle Scalability, Parts I and II
12:00 - Lunch (Blogging)
13:00 - Marco Gralike: The Ultimate Performance Challenge: How to Make XML Perform.?!
14:00 - Important work conference call (so I'll miss Henry Poras, who I planned to see)
15:30 - Kerry Osborne: My Favorite Scripts — 2010
16:45 - Richard Foote: Oracle Indexing Tricks and Traps
19:00 - Party Night
22:00 - Bed (probably)

Wednesday

08:30 - Cary Millsap: Lessons Learned, Version 2010.03
09:45 - Tanel Põder: Understanding LGWR, log file sync Waits and Commit Performance
11:00 - Dave Abercrombie: End-to-End Metrics for Troubleshooting and Monitoring
12:00 - Lunch (Blogging)
13:00 - Alex Gorbachev: Run-Time Load Balancing in Oracle RAC
14:15 - Christian Antognini: Diagnosing Parallel Executions Performance
15:30 - Farewell
16:30 - Relaxation (which might well be poured into a glass or three)

Thursday

08:30 - Training Day with Tanel Põder

Friday

Catch up on emails and blogging, then head home ...

As usual, there are a couple of really tough choices in there so a) no slight intended and b) I might make a few last minute changes of mind!

Hotsos 2010 – Day 0 – Cuddly Toy Carnage

It's 4am on Sunday, so it must be America.

This time I think I'm going with the Jonathan Lewis approach of sticking to GMT while I'm over here and accepting I'll be waking up at some ungodly hour which means I can get several hours work done before the day kicks off. That might mean being in bed at 9pm most days or a mid-afternoon nap somewhere along the way but I won't worry about that for now.

Yesterday was a long but pretty uneventful day. Door-to-door was around 20 hours but only 10 hours of that was spent in the air. After Continental being responsible for one of the two worst Transatlantic trips in my past (and thanks to Virgin Economy for the other), I must have been mad to book with them again, particularly with a 6 hour layover at Newark. (Thanks to US Immigration for wiping an hour off the layover, though!) I had a pretty similar schedule at my first Symposium in 2006 which was a conference I particularly enjoyed and went well so maybe I booked it again for luck or maybe the fact that Continental joined the Star Alliance last autumn is more to do with it and I'm a hopeless miles junkie. I'll let you decide. In the event, it was a pretty good trip for Economy - empty middle seat on the first flight (with a very nice Video on Demand system) and an entire row to myself on the second. Add 6 hours worth of lounge access in the middle (allowing for several trips through security for a smoke) and I couldn't have asked for much more*. I still hope the upgrade I'm wait-listed for on the way back comes through, though ;-)

As usual, I found it virtually impossible to get any work done on the way. I always think I'm going to use that utterly wasted day preparing for my presentation, but I find that all I really want to do when I'm travelling is watch movies, read the papers or chill out. I'm just not in the right mind-set for work. I need to hit the hotel before I can begin work in earnest, but that's the benefit of arriving a day early.

By the time I reached the hotel it was about 10pm so I only managed two small beers with Marco Gralike, Mark Farnham and Mark Bobac. I mentioned something to them that's been on my mind this week. I really do love this conference and it hit me during the week that, as well as the obvious factors of it being dedicated to performance, great speakers, educated attendees and all the other stuff I've mentioned in the past, it just feels right and I'm convinced that's to do with the size and location. Whereas at other conferences I'm bumping into people all the time, get talking and therefore miss other people, at this conference everyone's in the same place so I'm with everyone all the time, if that makes any sense. That's also true of the Miracle events, but you need a much stronger constitution to keep up with things and there's nowhere to hide! Regardless, I get genuinely excited about coming to the Symposium, excited about doing my presentation and it's no different this time. Oh, and as U.S. hotels go (I'm not a fan - different style) I really like this one. I was torn between the complimentary suite upgrade and the smoking room. No prizes for guessing which I picked ;-) I've got *work* to do!

Unfortunately for the Boys and Girls it was a full 20 hours in that suitcase and I don't think luggage handlers are thinking about Cuddly Toys. I think I'm going to have to start shipping them in proper carriers with a decent air supply. One of the problems is that I was torn between a case too small and a case too big. Once I picked the case that was too big (under significant pressure from a certain Comfy Seat!) there was a little bit too much movemement in transit!



Still, they all seem a lot happier now, if not a little concerned about how they're actually going to watch the TV! (Note for photo snobs ... I've bought Mads' compact camera with me this week so used that for this second one. I'll see if I can make the photos a little more bearable than those taken on my phone, but no promises.)



Top row, L-R : Chris, Derek Jr, Mario
Bottom row, L-R : The Rabbit, Little H, Austin

Should anyone doubt me, they were all unpacked *before* I went down to the hotel bar!

Today I will be mostly hiding out and working on my presentation and making sure that my demos are going to work on my new laptop. If that sounds like me repeating the mistakes of my past few conferences, the idea is that if I can do it today and then present tomorrow, then I'll be done and I can kick back, relax and learn things, building up to the crescendo of training day with Tanel Poder on Thursday ;-)

Now, about that presentation of mine ....

* See, this is what I mean about Twitter being useless. If I want to bitch about travel (and there seems to be a *lot* of bitching about travel on Twitter), then I want more than 140 characters! While I'm talking about Twitter, I will blog about my experiences soon, over on the personal blog, but I could always see a good case for it being slightly more interesting and useful during conferences, so I set up a @doug_conference account for me at conferences. I tried to do it using my normal mail account, but I seem to have cast that into a pool of dark matter by having the temerity to delete my previous account. I won't lose any sleep over that! Once Cary Millsap, Alex G or one of the Twitterati tell me the hash tag thing for Hotsos 2010, I might try using it! Oh, and if there are fewer blog posts from me this conference, blame that Twitter crowd and their inability to find 30 min a day to blog - they're a bad influence, I tell you ....

OOW Call For Papers open

Until 21st March.

Video: UKOUG Conference Series Technology & E-Business Suite 2009

What a great video!

I present my evidence, Your Honour ...

0:50 - You get a sense of how scary it might be to speak in Hall 1.
0:59 - Kurt the DUDE guy falls asleep whilst listening to a keynote.
1:32 - Tony Davis, ex-Apress editor points out that it's the networking, stupid!
2:12 - Robyn Sands points out that by going to UKOUG Tech/EBS Conference you will hear 'the top-notch people'
2:20 - One of the 'top-notch people' walks around a bit on stage, hand-waving.
2:30 - Someone who is clearly the most photogenic speaker on the agenda puts in a second appearance.

Plus lots of other strange sightings of friends ...

Statistics on Partitioned Tables – Part 5

Actually, before looking at any recent features, let me introduce one more aspect of the existing aggregation approach used by Oracle. The examples used to date have been based on INSERTing new rows into subpartitions and, although that's the approach used for some of our tables and will suit some systems, the likelihood is that in a near-real-time DW you will be using partition exchange at some point. Which means we need to understand how the stats might be gathered and then aggregated up to the partition and table-level stats.

Although there might be other approaches, I'd say that there are two distinct approaches you are likely to use.

1) Create a temporary load table, load it with data, gather statistics on it and then exchange it with the relevant subpartition in the real table.

2) Create a temporary load table, load it with data, exchange it with the relevant subpartition and then gather stats on the subpartition.

Pete Scott left a comment on a previous post stating that he rarely uses approach 1 so no doubt he'll leave another comment here expanding on his reasons ;-) What I want to show you is what happens if you do use approach 1 and introduce the _minimal_stats_aggregation hidden parameter that's been kicking around since Oracle 8i. The default setting of the parameter is TRUE, which means that Oracle minimises automatic stats aggregation activity. Let's see that in action.

First of all I'll recreate TEST_TAB1 as it was at the start of the series and add a new partition (and, by implication, the related subpartitions) and create a seperate table that I'll load the data into.

SQL> ALTER TABLE TEST_TAB1
  2  ADD  PARTITION P_20100209 VALUES LESS THAN (20100210);

Table altered.

SQL> DROP TABLE LOAD_TAB1;

Table dropped.

SQL> CREATE TABLE LOAD_TAB1
  2  AS SELECT * FROM TEST_TAB1 WHERE 1=0;

Table created.

SQL> CREATE UNIQUE INDEX LOAD_TAB1_IX1 ON LOAD_TAB1
  2  (REPORTING_DATE, SOURCE_SYSTEM, SEQ_ID)
  3  NOPARALLEL COMPRESS 1;

Index created.

Now I'll use LOAD_TAB1 to repeat the same process for the four different subpartitions - INSERT data into LOAD_TAB1, gather stats on it and then exchange it with the relevant subpartition of TEST_TAB1.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 400, 'P');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 600, 'P');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 900, 'Z');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1');

PL/SQL procedure successfully completed.

SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_GROT WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION  P_20100209_GROT REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

SQL> TRUNCATE TABLE LOAD_TAB1;

Table truncated.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 400, 'U');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 600, 'U');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 900, 'U');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1');

PL/SQL procedure successfully completed.

SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_JUNE WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION  P_20100209_JUNE REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

SQL> TRUNCATE TABLE LOAD_TAB1;

Table truncated.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 400, 'N');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 600, 'N');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 900, 'N');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1');

PL/SQL procedure successfully completed.

SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_HALO WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION  P_20100209_HALO REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

SQL> TRUNCATE TABLE LOAD_TAB1;

Table truncated.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 400, 'P');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 600, 'P');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 900, 'Z');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1');

PL/SQL procedure successfully completed.

SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_OTHERS WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION  P_20100209_OTHERS REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

All of the P_20100209 subpartitions have stats that were swapped in as part of the partition exchange operation so hopefully there'll be some aggregated global statistics.

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS                                              
------------------------------ --- -------------------- ----------                                              
TEST_TAB1                      NO                                                                               

SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS               
------------------------------ ------------------------------ --- -------------------- ----------               
TEST_TAB1                      P_20100131                     NO                                                
TEST_TAB1                      P_20100201                     NO                                                
TEST_TAB1                      P_20100202                     NO                                                
TEST_TAB1                      P_20100203                     NO                                                
TEST_TAB1                      P_20100204                     NO                                                
TEST_TAB1                      P_20100205                     NO                                                
TEST_TAB1                      P_20100206                     NO                                              
TEST_TAB1                      P_20100207                     NO                                                  
TEST_TAB1                      P_20100209                     NO                                                   

9 rows selected.

Oh, well, that doesn't seem to have worked. Maybe the LOAD_TAB1 stats weren't gathered correctly or didn't appear as part of the subpartition exchange operation?

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS                                   
------------------------------ ------------------------------ --- -------------------- ----------                                    
TEST_TAB1                      P_20100131_GROT                NO                                                         
TEST_TAB1                      P_20100131_HALO                NO                                                                    
TEST_TAB1                      P_20100131_JUNE                NO                                                        
TEST_TAB1                      P_20100131_OTHERS              NO                                                

<<output snipped>>

TEST_TAB1                      P_20100209_GROT                NO  28-FEB-2010 21:41:47          3                
TEST_TAB1                      P_20100209_HALO                NO  28-FEB-2010 21:41:49          3                
TEST_TAB1                      P_20100209_JUNE                NO  28-FEB-2010 21:41:49          3                
TEST_TAB1                      P_20100209_OTHERS              NO  28-FEB-2010 21:41:50          3                

36 rows selected.


The subpartition stats are ok, then, but the aggregation process hasn't happened and that's because _miminal_stats_aggregation is set to TRUE (the default) which instructs Oracle to minimise aggregation operations and one of the ways it does so is to not aggregate statistics as a result of a partition exchange operation but to leave you to do that manually by gathering stats on the table partition. If we were to modify the parameter to a non-default value (and, being an underscore parameter, that's your own choice at your own risk ...), we would see different behaviour. I ran the same script, but with this small addition that changes the parameter setting at the session level.

SQL> alter session set "_minimal_stats_aggregation"=FALSE;

Session altered.

Which will change the end result to this ...

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS                                             
------------------------------ --- -------------------- ----------                                             
TEST_TAB1                      NO                                                                              

SQL>
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS              
------------------------------ ------------------------------ --- -------------------- ----------              
TEST_TAB1                      P_20100131                     NO                                               
TEST_TAB1                      P_20100201                     NO                                               
TEST_TAB1                      P_20100202                     NO                                               
TEST_TAB1                      P_20100203                     NO                                               
TEST_TAB1                      P_20100204                     NO                                               
TEST_TAB1                      P_20100205                     NO                                               
TEST_TAB1                      P_20100206                     NO                                               
TEST_TAB1                      P_20100207                     NO                                               
TEST_TAB1                      P_20100209                     NO  28-FEB-2010 21:41:53         12              

9 rows selected.

Note that there are still no statistics at the table level because not all of the partitions have stats yet, so aggregation can't take place, but there are aggregated statistics on the P_20100209 partition, because all of the relevant subpartitions do have stats.

All you need to remember is that the default setting of _minimal_stats_aggregation means that, unless you explicitly gather statistics on the partitions you've just exchanged, aggregation will not take place! Actually, copying stats will also invoke the aggregation process too, but I'll deal with that in the next post. (Updated later. That last sentence might not be true. I've just tried something at home and I'm seeing different results at work, so more investigation needed.)

Oh, and there's much more on this subject over on Randolf Geist's blog post.

Statistics on Partitioned Tables – Part 4

In the last post I illustrated the problems you can run into when you rely on Oracle to aggregate statistics on partitions or subpartitions to generate estimated Global Statistics at higher levels of the table. Until there are statistics for all of the relevant structures then aggregation won’t take place so, for example, if you have statistics for three out of four subpartitions, there won’t be any aggregated global statistics on the related partition until you gather statistics on the fourth subpartition. Randolf Geist left a comment describing how you might avoid problems with this.

"In order to solve the issue of adding partitions with initially missing
statistics screwing up the aggregated statistics it was taken care that
newly added subpartitions got their statistics immediately updated (with
0 rows in that case) – which didn't take a lot of time since the
subpartitions were empty and it solved the issue with the aggregated
statistics.
"

That’s what our system does, but we introduced a change in the last release that caused the problems that inspired this series of posts …

First let’s start with an empty table (definition hasn’t changed since the first post). Now, because we are so paranoid about partitions without stats, we’ll gather statistics at the PARTITION level even though the table is empty at the moment. I’m not going to specify a partition name here to cut the text back a bit, but on the real system we would have. Regardless, we’ll still see the same problematic end result.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'PARTITION');

PL/SQL procedure successfully completed.

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:04:24          0

OK, so the table statistics aren’t true Global Statistics but that’s ok, we know about that. We also know that there’s no data in the table at this stage so the stats reflect that. When we look at the Partition level stats :-

SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

They are true global statistics, albeit on no data at this stage, but at least we have some statistics to reflect that. Looking at the Subpartition stats :-

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO
TEST_TAB1                      P_20100131_HALO                NO
TEST_TAB1                      P_20100131_JUNE                NO
TEST_TAB1                      P_20100131_OTHERS              NO
TEST_TAB1                      P_20100201_GROT                NO
TEST_TAB1                      P_20100201_HALO                NO
TEST_TAB1                      P_20100201_JUNE                NO
TEST_TAB1                      P_20100201_OTHERS              NO

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                NO
TEST_TAB1                      P_20100206_HALO                NO
TEST_TAB1                      P_20100206_JUNE                NO
TEST_TAB1                      P_20100206_OTHERS              NO
TEST_TAB1                      P_20100207_GROT                NO
TEST_TAB1                      P_20100207_HALO                NO
TEST_TAB1                      P_20100207_JUNE                NO
TEST_TAB1                      P_20100207_OTHERS              NO

32 rows selected.

No subpartition stats at all at this stage which is expected behaviour and we’ll be gathering them later after we load the data. I’m going to skip the column statistics at this stage because I don’t need them to illustrate the problem. So let’s imagine that on the live system we’ve just created the partitions above and are about to load data into the P_20100206_GROT subpartition.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 100000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 3000000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 200000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 110000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 240000, 'U');

1 row created.

SQL> COMMIT;

Commit complete.

Next our normal stats gathering approach is invoked and we gather stats on the subpartition just loaded.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION',
                                         PARTNAME => 'P_20100206_GROT');

PL/SQL procedure successfully completed.

N.B. It’s probably worth pointing out at this stage that I put a short pause
in the test script between the original stats gathering on the empty
table and the INSERTs and gather on the newly-loaded subpartition so you might want
to pay attention to the LAST_ANALYZED values here.

So how do the stats look?

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:25          0

Mmmmmm …. I can see that the LAST_ANALYZED time has been updated, but NUM_ROWS is still 0 at the table level. How about the partitions?

SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

So, as far as the Table and Partition Statistics look, this table is still empty! That’s not good and I can imagine a near future of execution plans with CARDINALITY=1 and MERGE JOIN CARTESIAN. Looking at the LAST_ANALYSED values on the Partitions, I can see that the timestamp hasn’t changed, which is another sign that something is wrong.

I’ll check that the subpartition stats were gathered correctly.

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO
TEST_TAB1                      P_20100131_HALO                NO
TEST_TAB1                      P_20100131_JUNE                NO
TEST_TAB1                      P_20100131_OTHERS              NO
TEST_TAB1                      P_20100201_GROT                NO
TEST_TAB1                      P_20100201_HALO                NO
TEST_TAB1                      P_20100201_JUNE                NO
TEST_TAB1                      P_20100201_OTHERS              NO

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                YES 28-FEB-2010 08:06:25          5
TEST_TAB1                      P_20100206_HALO                NO
TEST_TAB1                      P_20100206_JUNE                NO
TEST_TAB1                      P_20100206_OTHERS              NO
TEST_TAB1                      P_20100207_GROT                NO
TEST_TAB1                      P_20100207_HALO                NO
TEST_TAB1                      P_20100207_JUNE                NO
TEST_TAB1                      P_20100207_OTHERS              NO

32 rows selected.

Ah, perhaps that's what the problem is. Only one of the P_20100206 subpartitions has valid stats so Oracle can not generate aggregated Global Stats at the higher levels  of the table. So I'll try to fix that by gathering statistic on all of the subpartitions in the table. (In fact, I only really need to gather stats on the remaining P_20100206 subpartitions but I'll use this approach for brevity)

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION');

PL/SQL procedure successfully completed.

Let’s check that all of the subpartitions have valid statistics now.

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_OTHERS              YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_OTHERS              YES 28-FEB-2010 08:06:25          0

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                YES 28-FEB-2010 08:06:25          5
TEST_TAB1                      P_20100206_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100206_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100206_OTHERS              YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_OTHERS              YES 28-FEB-2010 08:06:25          0

32 rows selected.

OK, so Oracle should have aggregated the subpartition stats to generate global stats on the partitions and table.

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:25          0

SQL>
SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

So, according to the table and the partition stats, the table is still empty and those partition statistics still haven’t been updated!

The problem here is that Oracle won’t overwrite true global stats with aggregated global stats. When you think about it, that’s a sensible approach because if I have a strategy of collecting Table and Partition stats (i.e. the Oracle-recommended strategy covered in the first post) then the last thing I want is those global stats constantly being overwritten by aggregated stats (with incorrect NDVs) when stats are gathered on subpartitions!

Our mistake here could be viewed as a combination of a) not following the Oracle recommendations (because if we did, we’d also be gathering global stats on the Table and Partitions using a seperate task and b) once we depart from that strategy, gathering stats at the incorrect level. Those Partition stats that we gathered can never be over-written except by gathering stats again on the Partitions, which would then be aggregated up to the table level.

Allowing for the fact we want to (have to?), use our current approach, we should only ever gather stats at the SUBPARTITION level which will then be aggregated up to the Table and the Partition level.

As for the fix, we deleted the existing stats, to rid the partitions of their global stats and then regathered at the SUBPARTITION level as a one-off exercise.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION');

PL/SQL procedure successfully completed.

The important change is that we now have aggregated stats at both the Table and Partition levels which can then be updated by the aggregation process as we gather stats on new SUBPARTITIONS. Checking the statistics on the Table and Partitions …

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:26          5

SQL>
SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100201                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100202                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100203                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100204                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100205                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100206                     NO  28-FEB-2010 08:06:26          5
TEST_TAB1                      P_20100207                     NO  28-FEB-2010 08:06:26          0

8 rows selected

All of the partition stats have been updated and are now aggregated rather than true global stats. A modification to the metadata that our stats process uses to change the
granularity from PARTITION to SUBPARTITION will ensure stats are always gathered at the subpartition level and stop the problem from
re-occuring.

You could argue that we could have avoided all of this by just using the default stats gathering strategy and not try to be too clever, but we would really struggle to support the required additional workload. Oh, and this example makes the problem obvious because the stats were
gathered on empty partitions, we knew we’d done so and it was relatively easy to spot zero-row partitions, but imagine if
someone gathered statistics on your partitions manually for some reason
(it wouldn’t be difficult to decide that seemed sensible) and the row counts for partitions are several million or so, frozen and
stuck that way forever until someone decides to repeat the process? Would you really notice the aggregation process
wasn’t working for some reason?

Regardless of whether the problem is self-inflicted, as soon as we
spotted this mistake, I could imagine others making the same mistake if
they don’t understand the aggregation process fully.

In the next few posts I’ll look at some of the new approaches Oracle has introduced which we’ve investigated, to see if they can help us to gather better global statistics or reduce our stats-gathering workload.

Statistics on Partitioned Tables – Part 4

In the last post I illustrated the problems you can run into when you rely on Oracle to aggregate statistics on partitions or subpartitions to generate estimated Global Statistics at higher levels of the table. Until there are statistics for all of the relevant structures then aggregation won’t take place so, for example, if you have statistics for three out of four subpartitions, there won’t be any aggregated global statistics on the related partition until you gather statistics on the fourth subpartition. Randolf Geist left a comment describing how you might avoid problems with this.

"In order to solve the issue of adding partitions with initially missing
statistics screwing up the aggregated statistics it was taken care that
newly added subpartitions got their statistics immediately updated (with
0 rows in that case) – which didn't take a lot of time since the
subpartitions were empty and it solved the issue with the aggregated
statistics.
"

That’s what our system does, but we introduced a change in the last release that caused the problems that inspired this series of posts …

First let’s start with an empty table (definition hasn’t changed since the first post). Now, because we are so paranoid about partitions without stats, we’ll gather statistics at the PARTITION level even though the table is empty at the moment. I’m not going to specify a partition name here to cut the text back a bit, but on the real system we would have. Regardless, we’ll still see the same problematic end result.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'PARTITION');

PL/SQL procedure successfully completed.

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:04:24          0

OK, so the table statistics aren’t true Global Statistics but that’s ok, we know about that. We also know that there’s no data in the table at this stage so the stats reflect that. When we look at the Partition level stats :-

SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

They are true global statistics, albeit on no data at this stage, but at least we have some statistics to reflect that. Looking at the Subpartition stats :-

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO
TEST_TAB1                      P_20100131_HALO                NO
TEST_TAB1                      P_20100131_JUNE                NO
TEST_TAB1                      P_20100131_OTHERS              NO
TEST_TAB1                      P_20100201_GROT                NO
TEST_TAB1                      P_20100201_HALO                NO
TEST_TAB1                      P_20100201_JUNE                NO
TEST_TAB1                      P_20100201_OTHERS              NO

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                NO
TEST_TAB1                      P_20100206_HALO                NO
TEST_TAB1                      P_20100206_JUNE                NO
TEST_TAB1                      P_20100206_OTHERS              NO
TEST_TAB1                      P_20100207_GROT                NO
TEST_TAB1                      P_20100207_HALO                NO
TEST_TAB1                      P_20100207_JUNE                NO
TEST_TAB1                      P_20100207_OTHERS              NO

32 rows selected.

No subpartition stats at all at this stage which is expected behaviour and we’ll be gathering them later after we load the data. I’m going to skip the column statistics at this stage because I don’t need them to illustrate the problem. So let’s imagine that on the live system we’ve just created the partitions above and are about to load data into the P_20100206_GROT subpartition.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 100000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 3000000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 200000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 110000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 240000, 'U');

1 row created.

SQL> COMMIT;

Commit complete.

Next our normal stats gathering approach is invoked and we gather stats on the subpartition just loaded.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION',
                                         PARTNAME => 'P_20100206_GROT');

PL/SQL procedure successfully completed.

N.B. It’s probably worth pointing out at this stage that I put a short pause
in the test script between the original stats gathering on the empty
table and the INSERTs and gather on the newly-loaded subpartition so you might want
to pay attention to the LAST_ANALYZED values here.

So how do the stats look?

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:25          0

Mmmmmm …. I can see that the LAST_ANALYZED time has been updated, but NUM_ROWS is still 0 at the table level. How about the partitions?

SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

So, as far as the Table and Partition Statistics look, this table is still empty! That’s not good and I can imagine a near future of execution plans with CARDINALITY=1 and MERGE JOIN CARTESIAN. Looking at the LAST_ANALYSED values on the Partitions, I can see that the timestamp hasn’t changed, which is another sign that something is wrong.

I’ll check that the subpartition stats were gathered correctly.

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO
TEST_TAB1                      P_20100131_HALO                NO
TEST_TAB1                      P_20100131_JUNE                NO
TEST_TAB1                      P_20100131_OTHERS              NO
TEST_TAB1                      P_20100201_GROT                NO
TEST_TAB1                      P_20100201_HALO                NO
TEST_TAB1                      P_20100201_JUNE                NO
TEST_TAB1                      P_20100201_OTHERS              NO

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                YES 28-FEB-2010 08:06:25          5
TEST_TAB1                      P_20100206_HALO                NO
TEST_TAB1                      P_20100206_JUNE                NO
TEST_TAB1                      P_20100206_OTHERS              NO
TEST_TAB1                      P_20100207_GROT                NO
TEST_TAB1                      P_20100207_HALO                NO
TEST_TAB1                      P_20100207_JUNE                NO
TEST_TAB1                      P_20100207_OTHERS              NO

32 rows selected.

Ah, perhaps that's what the problem is. Only one of the P_20100206 subpartitions has valid stats so Oracle can not generate aggregated Global Stats at the higher levels  of the table. So I'll try to fix that by gathering statistic on all of the subpartitions in the table. (In fact, I only really need to gather stats on the remaining P_20100206 subpartitions but I'll use this approach for brevity)

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION');

PL/SQL procedure successfully completed.

Let’s check that all of the subpartitions have valid statistics now.

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_OTHERS              YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_OTHERS              YES 28-FEB-2010 08:06:25          0

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                YES 28-FEB-2010 08:06:25          5
TEST_TAB1                      P_20100206_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100206_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100206_OTHERS              YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_OTHERS              YES 28-FEB-2010 08:06:25          0

32 rows selected.

OK, so Oracle should have aggregated the subpartition stats to generate global stats on the partitions and table.

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:25          0

SQL>
SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

So, according to the table and the partition stats, the table is still empty and those partition statistics still haven’t been updated!

The problem here is that Oracle won’t overwrite true global stats with aggregated global stats. When you think about it, that’s a sensible approach because if I have a strategy of collecting Table and Partition stats (i.e. the Oracle-recommended strategy covered in the first post) then the last thing I want is those global stats constantly being overwritten by aggregated stats (with incorrect NDVs) when stats are gathered on subpartitions!

Our mistake here could be viewed as a combination of a) not following the Oracle recommendations (because if we did, we’d also be gathering global stats on the Table and Partitions using a seperate task and b) once we depart from that strategy, gathering stats at the incorrect level. Those Partition stats that we gathered can never be over-written except by gathering stats again on the Partitions, which would then be aggregated up to the table level.

Allowing for the fact we want to (have to?), use our current approach, we should only ever gather stats at the SUBPARTITION level which will then be aggregated up to the Table and the Partition level.

As for the fix, we deleted the existing stats, to rid the partitions of their global stats and then regathered at the SUBPARTITION level as a one-off exercise.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION');

PL/SQL procedure successfully completed.

The important change is that we now have aggregated stats at both the Table and Partition levels which can then be updated by the aggregation process as we gather stats on new SUBPARTITIONS. Checking the statistics on the Table and Partitions …

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:26          5

SQL>
SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100201                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100202                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100203                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100204                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100205                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100206                     NO  28-FEB-2010 08:06:26          5
TEST_TAB1                      P_20100207                     NO  28-FEB-2010 08:06:26          0

8 rows selected

All of the partition stats have been updated and are now aggregated rather than true global stats. A modification to the metadata that our stats process uses to change the
granularity from PARTITION to SUBPARTITION will ensure stats are always gathered at the subpartition level and stop the problem from
re-occuring.

You could argue that we could have avoided all of this by just using the default stats gathering strategy and not try to be too clever, but we would really struggle to support the required additional workload. Oh, and this example makes the problem obvious because the stats were
gathered on empty partitions, we knew we’d done so and it was relatively easy to spot zero-row partitions, but imagine if
someone gathered statistics on your partitions manually for some reason
(it wouldn’t be difficult to decide that seemed sensible) and the row counts for partitions are several million or so, frozen and
stuck that way forever until someone decides to repeat the process? Would you really notice the aggregation process
wasn’t working for some reason?

Regardless of whether the problem is self-inflicted, as soon as we
spotted this mistake, I could imagine others making the same mistake if
they don’t understand the aggregation process fully.

In the next few posts I’ll look at some of the new approaches Oracle has introduced which we’ve investigated, to see if they can help us to gather better global statistics or reduce our stats-gathering workload.