Open Source Big Data Technologies

| Jan 29, 2014

Hi All

While doing a comparison analysis for building a reference architecture for Big Data technology stumbled on a very impressive Open source Big Data Technology mashup . Thanks to http://www.bigdata-startups.com/ . The most impressive part of this mashup is breaking the whole Big Data operational paradigm into multiple stages and giving available opensource technology.

Open Source Big Data Techonologies

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”


EDW in the Library with Single Canonical Form – get a clue about killing the business

The game Cluedo (or just plain Clue in North America) is about discovering which person committed the murder, in what room using what.  What is amazing is that in IT we have the easiest game of Cluedo going and yet over and over again we murder the poor unfortunate business in the same way, then stand back and gasp 'I didn't know that would kill them'. I talk about the EDW, the IT departments

Six things to make your Big Data project succeed

So I wrote about why your Hadoop project will fail so I think its only right that I should follow up with some things that you can do to actually make the Big Data project you take on succeed.  The first thing you need to do is stop trying to make 'Big Data' succeed and instead start focusing on how you educate the business on the value of information and then work out how to deliver new (more...)

The People’s Democratic Republic of IT

IT is a communist state in many organisations, one that believes in rigid adherence to inflexible approaches despite clear indications that they inhibit growth and a central approach to planning that Mao and Stalin would have thought is taking things a little too far. This really doesn't make sense in the capitalistic world of business and the counter-revolution is well under way. Its <!--[if

Big Data? Start with Right Data

I’m wearing a Nike Fuelband – one of those fitness/activity tracker gizmos. Nike is offering both a website and an app showing my daily activity. As a customer, I am expecting these two to contain the same data. After all, my bank balance is the same in my mobile banking app, in an ATM or in a web browser.

Unfortunately, Nike does not have a proper infrastructure behind their gadget, so the numbers do not (more...)

How integration guys created a data security nightmare

There has been a policy in integration that has stored up a really great challenge of data security, and by great I don't mean 'fantastic' I mean 'aw crap'.  Its a policy that was done for the best of reasons and one that really will in future represent a growing challenge to Big Data and federated information. The policy can be described as this: Users authenticate with Apps, Apps

Six reasons your Big Data Hadoop project will fail in 2014

Ok so Hadoop is the bomb, Hadoop is the schizzle, Hadoop is here to solve world hunger and all problems.  Now I've talked before about some of the challenges around Hadoop for enterprises but here are six reasons that Information Week is right when it says that Hadoop projects are going to fail more often than not. 1. Hadoop is a Java thing not a BI thing The first is the most important

The Twelve Days of NoSQL: Day Ten: Big Data

On the tenth day of Christmas, my true love gave to me Ten lords a-leaping. The topic of Big Data is often encountered when talking about NoSQL so let’s give it a nod. In 1998, Sergey Brin and Larry Page invented an algorithm for ranking web pages (The Anatomy of a Large-Scale Hypertextual Web Search […]

How To retrieve/backup Views In Endeca

Hi All,

Last few weeks I have been engaged with a customer, helping them them with remediation of Endeca project. During remediation, faced a typical challenge, where all the graphs and EQLs were erroring out. After doing some research found out that its a known issue . I spent good amount (more...)

Bye Bye 2013 !!! Year of Big Data

Hi All,

Wishing you all readers!! a very happy new year. 2013 is over and dawn of 2014 has arrived. It just feel like yesterday and now we are here sitting and waiting for the year number to change. By the time I am writting blog, Australia, Mumbai and Dubai (more...)

The Twelve Days of NoSQL: Day Seven: Schemaless Design

On the seventh day of Christmas, my true love gave to me Seven swans a-swimming. As we discussed on Day One, NoSQL consists of “disruptive innovations” that are gaining steam and moving upmarket. So far, we have discussed functional segmentation (the pivotal innovation), sharding, asynchronous replication, eventual consistency (resulting from (more...)

Things you (probably) don’t need in 2014: Big Data

There is an interesting article on Forbes where Paul Sonderegger from Oracle is making the case that you have to jump onto the “Big Data” bandwagon without delay if you want to avoid your big-data-using competitors crushing you.

But he would say that, wouldn’t he?

In reality, most companies already (more...)

Why in Business driven information its the consumers view that matters

When doing the Business Data Lake pieces it took me back to a view that I had around SOA in that you should take the consumers view when designing a service.  This I think is more critical when looking at analytics and reporting where it really is all about the consumption. What (more...)

How Business SOA thinking impacts data

Over the years I've written quite a bit about how SOA, when viewed as a tool for Business Architecture, can change some of the cherished beliefs in IT.  One of these was about how the Single Canonical Form was not for SOA and others have talked about how MDM and (more...)

The only V that counts in Big Data is Value

So what is Big Data?  Its Variety, Velocity, Volume right?  But what does that really mean?  Should I get loads of data and drop it into Hadoop, pull in anything I can lay my hands on and I'm now 'doing Big Data'? Should I plug in my packet monitoring software (more...)

How To Find Size Of Table In Hive / HDFS

| Nov 19, 2013

Hi All

Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the (more...)

Hadoop Streaming, Hue, Oozie Workflows, and Hive

Elephant Painting

MapReduce with Hadoop Streaming in bash – Bonus!

MapReduce with Hadoop Streaming in bash – Part 3

Hadoop Streaming Bash

In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in (more...)

MapReduce with Hadoop Streaming in bash – Part 2

Hadoop Streaming Bash

In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce (more...)

TF-IDF with Hadoop Streaming in bash – Part 1

Hadoop Streaming Bash

So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming.

Hadoop Streaming allows you to write MapReduce code in any language that can process (more...)