Getting started with Apache Pig

If, like me, you want to play around with data in a Hadoop cluster without having to write hundreds or thousands of lines of Java MapReduce code, you most likely will use either Hive (using the  Hive Query Language HQL) or Pig.

Hive is a SQL-like language which compiles to Java map-reduce code, while Pig is a data flow language which allows you to specify your map-reduce data pipelines using high level abstractions. 

The way I like to think of it is that writing Java MapReduce is like programming in assembler:  you need to manually construct every low level (more...)

Buzz Around Non-Relational DBs

Reposting from my other blog

Last Saturday we (GITPRO – Global Indian Tech Professionals Association) arranged Tech Talk on NoSQL (nonRelational actually) DBs and Scaling Hadoop. It was very well attended. In the general introduction session when many introduced themselves they told their interests in Hadoop and (more...)