High log file sync waits? Check log parallelism!

Log parallelism is an optimization introduced in 9.2 that reduces latch contention due to redo copy to the log buffer by enabling multiple public redo buffers (or “strands”). In many cases, however, it can cause a massive degradation of commit performance (“log file sync” wait increase). In this blog post, I will describe the mechanism, illustrate it with test results, and discuss solutions. Tests were performed on several different 11gR2 databases on Solaris servers.

Log parallelism

Log parallelism is controlled by (more...)

CPU-starved LGWR

In my recent post I showed how log file sync (LFS) and log file parallel write (LFPW) look for normal systems. I think it would also be interesting to compare that to the situation when LGWR does not have enough CPU.

I happen to have collected LGWR and database-level trace files for a 11.2.0.3 database on a Solaris 10 server which was under serious pressure (50 threads mostly inserting and committing data, only 32 (more...)

Piggyback commits

Not every commit results in a redo write. This is because there are multiple optimizations (some controlled by the user e.g. with COMMIT_LOGGING parameter, some automatic) that aim at reducing the number of redo writes caused by commits by grouping redo records together. Such group or “piggyback” commits are important for understanding log file sync waits and various statistics around it. In particular, “piggyback” commits play a key role when many sessions commit concurrently (more...)

How not to troubleshoot log file sync waits

Introduction

There is a very common mistake in troubleshooting log file sync (LFS) waits: comparing its average time to average log file parallel write (LFPW) and trying to deduce from that whether the root cause of the wait is slow I/O or something else. The fact that this approach is recommended by Oracle itself (e.g. MOS 1376916.1) and many independent experts unfortunately doesn’t make it any less wrong.

Theory

It is well known that averages and ratios can distort the reality (Milsap and (more...)

Querying trace files

SQL trace file provide the highest level of detail possible about SQL execution. The problem with that information is converting it to a convenient format for further analysis. One very good solution is parsetrc tool by Kyle Hailey written in Perl. It gives high-resolution histograms, I/O transfer rates as a function of time, and other very useful info. Unfortunately, I myself am not a Perl expert, so it’s a bit difficult for me to customize (more...)

Block size and multirow DML peformance

While 8k is the default block size, Oracle supports other block sizes, too. Smaller block sizes (more commonly, 4k) are encouraged for OLTP workloads to reduce concurrency, and even smaller block size, 2k, is recommended for databases running on “Advanced format” (or “4K”) storage. Oracle documentation warns us of possible implications when storing larger rows in 2k/4k blocks, such as potentially larger space overhead due to metadata, or even possibility of row chaining. What it (more...)

Peeling through layers of cache

Nowadays, data in databases is wrapped in may layers of cache: result cache, buffer cache, OS page cache, storage hardware cache… They greatly improve performance, but they also make it less stable and harder to predict. And when I/O performance takes a turn for worse, one has to go through multiple layers of cache trying to understand what went wrong and why. I had such a case not too long ago.

The incident took place (more...)

Row cache lock

A rather odd performance issue happened at work a few days ago. All of a sudden, one of the databases started to work very slowly, and a quick look in ASH data showed that it was spending over 70% of its time waiting on “row cache lock” (not to be confused with “latch: row cache lock”).

It was a test database (11.2.0.3) on an overloaded server with non-standard configuratioin, so my initial reaction (more...)

Caution: hot patches

For many Oracle database patches there is an option to apply them “online”, i.e. without stopping the database and related services (listener, ASM etc.).  This is very convenient when downtime is impossible or difficult to arrange. However, this convenience comes at a certain price, including some performance penalties. Such side effects of hot patching are not well understood, even by Oracle support engineers.

There is a MOS note 761111.1 where in addition to (more...)

DB time vs duration

Performance tuning is all about time. You measure the time it takes for a certain process to complete, and then you search for ways to reduce this time to improve end-users experience and/or increase the application productivity. But minimizing time is not enough — it’s important to minimize the correct (more...)