MapR Sandbox for Hadoop Learning

I got email about MapR Sandbox, that is a fully functional Hadoop cluster running on a virtual machine (CentOS 6.5) that provides an intuitive web interface for both developers and administrators to get started with Hadoop. I belief it's a good idea to learn about Hadoop and its ecosystem. Users can download for VMware VM or VirtualBox. I downloaded for VirtualBox and imported it. I changed about network to use "Bridged Adapter". After started... (more...)

NoSQL? No Thanks

There continues to be a disproportionate amount of hype around 'NoSQL' data stores.  By disproportionate I mean 'completely and utterly out of scale with the actual problems of the vast majority of companies'.  I wrote before about 'how NoSQL became more SQL'.  The point I made there is now more apparent the more I work with companies on Big Data challenges. There are three worlds of data

Big Data : Right Approach Right Solution

 

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article (more...)

Six things to make your Big Data project succeed

So I wrote about why your Hadoop project will fail so I think its only right that I should follow up with some things that you can do to actually make the Big Data project you take on succeed.  The first thing you need to do is stop trying to make 'Big Data' succeed and instead start focusing on how you educate the business on the value of information and then work out how to deliver new (more...)

The Twelve Days of NoSQL: Day Ten: Big Data

On the tenth day of Christmas, my true love gave to me Ten lords a-leaping. The topic of Big Data is often encountered when talking about NoSQL so let’s give it a nod. In 1998, Sergey Brin and Larry Page invented an algorithm for ranking web pages (The Anatomy of a Large-Scale Hypertextual Web Search […]

My history with Big Data

Before I joined Cloudera, I hadn't had much formal experience with Big Data. But I had crossed paths with one of its major use cases before, so I found it easy to pick up the mindset. My previous big project involved a relational database hooked up to a web server. (more...)

RDBMS and their bundle-mates

Relational DBMS used to be fairly straightforward product suites, which boiled down to:

  • A big SQL interpreter.
  • A bunch of administrative and operational tools.
  • Some very optional add-ons, often including an application development tool.

Now, however, most RDBMS are sold as part of something bigger.

Comments on the 2013 Gartner Magic Quadrant for Operational Database Management Systems

The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is dubious term when systems omit strict consistency, and when even strictly consistent systems may (more...)

My Strata + Hadoop World Schedule

Next week is Strata + Hadoop World which is bound to be exciting for those who deal with big data on a daily basis.  I’ll be spending my time talking about Cloudera Impala at various places so I’m posting my schedule for those interesting in catching about fast SQL on (more...)

Hadoop Streaming, Hue, Oozie Workflows, and Hive

Elephant Painting

MapReduce with Hadoop Streaming in bash – Bonus!

MapReduce with Hadoop Streaming in bash – Part 3

Hadoop Streaming Bash

In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in (more...)

MapReduce with Hadoop Streaming in bash – Part 2

Hadoop Streaming Bash

In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce (more...)

TF-IDF with Hadoop Streaming in bash – Part 1

Hadoop Streaming Bash

So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming.

Hadoop Streaming allows you to write MapReduce code in any language that can process (more...)

New York Oracle User Group Fall Conference Materials

Thank you all who attended my sessions at NYOUG Fall Conference this morning. I appreciate spending you most precious commodity - your time - with me. I sincerely hope you found both the presentations enlightening as well as entertaining.

Please see the details of the sessions below along with the (more...)

Cloudera Sentry and other security subjects

I chatted with Charles Zedlewski of Cloudera on Thursday about security — especially Cloudera’s new offering Sentry — and other Hadoop subjects.

Sentry is:

  • Developed by Cloudera.
  • An Apache incubator project.
  • Slated to be rolled into CDH — Cloudera’s Hadoop distribution — over the next couple of weeks.
  • Only useful (more...)

Upcoming Talks: OakTable World and Strata + Hadoop World

I haven’t had much time over the past year to do many blog posts, but in the next few months I’ll be doing a few talks about what I’ve been working on over that time, Cloudera Impala, an Open Source MPP SQL query engine for Hadoop.  Hope to see (more...)

“Disruption” in the software industry

I lampoon the word “disruptive” for being badly overused. On the other hand, I often refer to the concept myself. Perhaps I should clarify. :)

You probably know that the modern concept of disruption comes from Clayton Christensen, specifically in The Innovator’s Dilemma and its sequel, The Innovator’s Solution. The basic (more...)

Oracle Corp at useR! Conference 2013 #useR2013 #rstats

This year’s R User Conference happened in Albacete (Spain), gathering R professionals and enthusiasts all over the world since 2004, when it first began in Vienna. The sponsors this year were  REvolution analytics, Google, R-Studio, Oracle, and TIBCO. Other companies like OpenAnalytics and Mango Solutions were also present with a booth stand. Besides sponsoring the (more...)

Demystifying Big Data for Oracle Professionals

Ever wonder about Big Data and what exactly it means, especially if you are already an Oracle Database professional? Or, do you get lost in the jargon warfare that spews out terms like Hadoop, Map/Reduce and HDFS? In this post I will attempt to explain these terms from the perspective (more...)

The 3 ways Hadoop will change your Business Intelligence

“It’s the analytics stupid!” Obviously the offense is not intended at the dear reader. It’s a wake up call for all the people excited with Hadoop and lack BI vision. The BI people that lack infrastructure vision are also to blame. Blame for what? We’ll see later in this (more...)