My Strata + Hadoop World Schedule

Next week is Strata + Hadoop World which is bound to be exciting for those who deal with big data on a daily basis.  I’ll be spending my time talking about Cloudera Impala at various places so I’m posting my schedule for those interesting in catching about fast SQL on (more...)

Upcoming Talks: OakTable World and Strata + Hadoop World

I haven’t had much time over the past year to do many blog posts, but in the next few months I’ll be doing a few talks about what I’ve been working on over that time, Cloudera Impala, an Open Source MPP SQL query engine for Hadoop.  Hope to see (more...)

OakTable World

This weekend begins Oracle’s OpenWorld and there is something off the beaten path going on — OakTable World. There is a pretty good lineup of speakers, including myself, who will be talking on subjects that probably would not qualify for OpenWorld sessions for various reasons. My talk is entitled “Beyond the Relational Database” and in it I’ll be talking about non-relational (Big Data) technologies and what they have to offer. Be sure to bring an open mind! Hope to see you there.

To make things easy here is an iCal file for my session.

Linux 6 Transparent Huge Pages and Hadoop Workloads

This past week I spent some time setting up and running various Hadoop workloads on my CDH cluster. After some Hadoop jobs had been running for several minutes, I noticed something quite alarming — the system CPU percentages where extremely high.

Platform Details

This cluster is comprised of 2s8c16t Xeon L5630 nodes with 96 GB of RAM running CentOS Linux 6.2 with java 1.6.0_30. The details of those are:

$ cat /etc/redhat-release
CentOS release 6.2 (Final)

$ uname -a
Linux chaos 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012  (more...)

Pitfalls of Using Parallel Execution with SQL Developer

[This post was originally published on 2012/02/29 and was hidden shortly thereafter. I'm un-hiding it as of 2012/05/30 with some minor edits.]

Many Oracle Database users like tools with GUI interfaces because they add features and functionality that are not easily available from the command line interfaces like SQL*Plus. One of the more popular tools from my experiences is Oracle SQL Developer in part because it’s a free tool from Oracle. Given SQL Developer’s current design (as of version 3.1.07.42), some issues frequently show up when using it with Oracle Databases with Parallel Execution. SQL Developer (more...)

Visualizing Active Session History (ASH) Data With R

One of the easiest ways to understand something is to see a visualization. Looking at Active Session History (ASH) data is no exception and I’ll dive into how to do so with R and how I used R plots to visually present a problem and confirm a hypothesis. But first some background…

Background

Frequently DBAs use the Automatic Workload Repository (AWR) as an entry point for troubleshooting performance problems and in this case the adventure started the same way. In the AWR report Top 5 Timed Foreground Events, the log file sync event was showing up as the #3 (more...)

UKOUG – A Deep Dive into the SQL Monitoring Report

The UK Oracle User Group Conference 2011 is just around the corner and I just realized that I haven’t yet highlighted this, other than the “I’m speaking” banner on my blog.  I’ll be speaking on one of my favorite and most used reports — the SQL Monitor Report.  Below are the session details [direct link].  Hope to see you there!

Presentation Title: A Deep Dive into the SQL Monitoring Report
Speaker: Mr Greg Rahn
Company: Oracle
Presentation abstract: The SQL Monitoring Report was introduced in Oracle Database 11g and has become the single most used diagnostic report by the Oracle (more...)

Counting Triangles Faster

A few weeks back one of the Vertica developers put up a blog post on counting triangles in an undirected graph with reciprocal edges. The author was comparing the size of the data and the elapsed times to run this calculation on Hadoop and Vertica and put up the work on github and encouraged others: “do try this at home.” So I did.

Compression

Vertica draws attention to the fact that their compression brought the size of the 86,220,856 tuples down to 560MB in size, from a flat file size of 1,263,234,543 bytes resulting in around a 2.25X (more...)

Exadata Smart Flash Logging Explained

I’ve seen some posts on the blogosphere where people attempt to explain (or should I say guess) how Exadata Smart Flash Logging works and most of them are wrong. Hopefully this post will help clear up some the misconceptions out there.

The following is an excerpt from the paper entitled “Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine” that goes into technical detail on the Exadata Smart Flash Logging feature.

Smart Flash Logging works as follows. When receiving a redo log write request, Exadata will do
parallel writes to the on-disk redo logs as well (more...)

Critical Skills for Performance Work

I was just watching John Rauser’s keynote “What is a Career in Big Data?” from last weeks Strata Conference New York and I have to say it’s an amazing talk. I would highly recommended it to anyone who does any type of data analysis, including any type of performance analysis.

I found many of the “critical skill” points John made to have a strong correlation to performance analysis work as well. Some quotations that really stand out to me:

On writing:

“[writing]…it’s the first major difference between mediocrity and greatness.” [10:39]

“If it isn’t written down, it (more...)