Coming Out

The last 16 years or so of my professional life have been dedicated to working on problems (and solutions!) in transactional middleware - by this, I mean systems that provide strong consistency guarantees: reliable queueing, distributed 2pc engines, higher level quality of service guarantees for lower level protocols (iiop, (more...)

Linux 6 Transparent Huge Pages and Hadoop Workloads

This past week I spent some time setting up and running various Hadoop workloads on my CDH cluster. After some Hadoop jobs had been running for several minutes, I noticed something quite alarming — the system CPU percentages where extremely high.

Platform Details

This cluster is comprised of 2s8c16t Xeon L5630 nodes with 96 GB of RAM running CentOS Linux 6.2 with java 1.6.0_30. The details of those are:

$ cat /etc/redhat-release
CentOS release 6.2 (Final)

$ uname -a
Linux chaos 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012  (more...)

Big Data

The other day I watched the Oracle Big Data forum. Now available here. A half-day event with various speakers on the subject of BigData, including Tom Kyte , a mentor who I admire!

In the forum they have gone over Oracle's approach to Big Data and allow me to summarise it below:
  1. Acquire - Collect Big Data, identify it, where is it? Then store it in Oracle NoSQL - a value-pair database

  2. Organise - Stage Big Data in a transient elastic database. Using Oracle Data Integrator and the Oracle Hadoop connector, reduce and distil it.

  3. Analyse - Start Analytics on (more...)

Getting started with Apache Pig

If, like me, you want to play around with data in a Hadoop cluster without having to write hundreds or thousands of lines of Java MapReduce code, you most likely will use either Hive (using the  Hive Query Language HQL) or Pig.

Hive is a SQL-like language which compiles to Java map-reduce code, while Pig is a data flow language which allows you to specify your map-reduce data pipelines using high level abstractions. 

The way I like to think of it is that writing Java MapReduce is like programming in assembler:  you need to manually construct every low level (more...)

Buzz Around Non-Relational DBs

Reposting from my other blog

Last Saturday we (GITPRO – Global Indian Tech Professionals Association) arranged Tech Talk on NoSQL (nonRelational actually) DBs and Scaling Hadoop. It was very well attended. In the general introduction session when many introduced themselves they told their interests in Hadoop and (more...)