Big Data


The other day I watched the Oracle Big Data forum. Now available here. A half-day event with various speakers on the subject of BigData, including Tom Kyte , a mentor who I admire!

In the forum they have gone over Oracle's approach to Big Data and allow me to summarise it below:
  1. Acquire - Collect Big Data, identify it, where is it? Then store it in Oracle NoSQL - a value-pair database

  2. Organise - Stage Big Data in a transient elastic database. Using Oracle Data Integrator and the Oracle Hadoop connector, reduce and distil it.

  3. Analyse - Start Analytics on (more...)

Getting started with Apache Pig

If, like me, you want to play around with data in a Hadoop cluster without having to write hundreds or thousands of lines of Java MapReduce code, you most likely will use either Hive (using the  Hive Query Language HQL) or Pig.

Hive is a SQL-like language which compiles to Java map-reduce code, while Pig is a data flow language which allows you to specify your map-reduce data pipelines using high level abstractions. 

The way I like to think of it is that writing Java MapReduce is like programming in assembler:  you need to manually construct every low level (more...)

Buzz Around Non-Relational DBs

Reposting from my other blog http://texploration.wordpress.com/2011/03/09/buzz-around-nonrelational-db/


Last Saturday we (GITPRO – Global Indian Tech Professionals Association) arranged Tech Talk on NoSQL (nonRelational actually) DBs and Scaling Hadoop. It was very well attended. In the general introduction session when many introduced themselves they told their interests in Hadoop and (more...)