Swapping (paging) for DBAs

Classic symptoms of memory pressure (free physical memory running low + swapping) are often more difficult to interpret than they seem, especially on modern enterprise grade servers. In this article, I attempt to bring some clarity to the issue. It is based on Linux, although many observations can be generalized to other Unix-like operating systems.

Basics definitions

Let’s define some basic terms and concepts:

Page is simply a unit of memory allocation, pretty much like (more...)

A picture paints a thousand words

When I first learned about Active Session History, it was a real game changer for me. It’s (kinda) like tracing which is always on, for every single session… well, active session, sure, but who cares about idle ones? For a while I got so obsessed with it that I almost stopped using other tools — fortunately, that was a only short while, because as great as ASH is, you still need other tools. But to (more...)

ASH for OS processes

There is a lot of different tools for analyzing OS process states which can be helpful in resolving non-trivial performance issues. One of the limitations of such tools is that they are mostly active ones — i.e. you have to do some extra work to collect the desired diagnostic information. This is inconvenient when the problem you’re facing is intermittent and manifests itself on an irregular and unpredictable schedule.

However, there are some great (more...)

Finding the root cause of “CPU waits” using stack profiling

Call stack profiling and flame graphs have been a hot topic in Oracle tech blogs last few years, and recently I got a chance to use it to troubleshoot an actual production performance issue. It was quite an interesting journey, with some twists and turns along the way. Let me start by presenting some background for the problem.

The database in question (a 3-node Exadata RAC cluster, 11.2.0.4 database version, 18.1. (more...)

Tools for troubleshooting network performance issues

In my previous article I discussed general questions related to network issues in Data Guard due to packet loss and/or retransmissions. Here I’d like to move to discussing specific tools and methodologies for troubleshooting such issues.

Such tools can be broken down by following criteria:

  • server-side or network-side
  • active or passive
  • level of detail they provide (aggregate statistics or individual packet capture).

I think the first item on the list is more or less self-explanatory: (more...)

Troubleshooting network throughput issues in Oracle Data Guard

Introduction

In this article I describe the basic mechanics of TCP and DataGuard as well as relevant performance metrics on the database, OS and network sides. The idea is to give DBAs some ammunition in addressing DataGuard performance issues. The most important stage of troubleshooting is the correct identification of the nature of the issue, e.g. being able to tell whether the problem has to do with the network as such, or DataGuard, or (more...)

Set-based processing

Last week I participated in Oracle’s Real World Performance event — four days of lectures, quizzes, live demos and hands-on exercises. It was quite interesting, even more so than I expected it to be.

Understandably, a lot of time was spent discussing the perils of row-by-row processing. After all, it was Real World Performance, so it was based on performance problems that the authors of the course faced most often. And many, if not most, (more...)

Business rules, common sense and query performance

Very often, significant performance benefits can be obtained by using some very basic knowledge of the application, its data and business rules. Sometimes even less than that: even if you are not familiar with the application logic at all, you can still use common sense to make some reasonable guesses that would get you a long way in improving query’s performance. Here is an example (based on an actual query that I had to tune (more...)

Reducing fetch time

Database query tuning is mostly about getting better plans. Mostly, but not always. Sometimes, the problem has nothing to do with the plan, and you might need to get a bit creative to find a solution. In this recent case a query was showing a decent performance when running from SQL Developer, but it took about 5 times longer to complete when running from R. The plan was the same, so I knew that it (more...)

Long-running INSERT

On one of the databases I’m looking after (11.2.0.4, Solaris, non-RAC), several different INSERT statements (all into tablespaces with manually managed segments) suffer from occasional hiccups. The symptoms are always the same: in one of the sessions, the INSERT gets stuck doing lots of single-block I/O against one of the indexes on the inserted table, and if other sessions are running similar INSERTs, they hang on enq: TX – index contention. The (more...)