Big Data and Analytics Top Ten Trends for 2014

Oracle recently published their view on the; Top Ten Trends “Big Data” & “Analytics“ for 2014. Find the list below: 1. Business Users Get Hooked on Mobile Analytics –> Oracle Business Intelligence Mobile App Designer 2. Analytics Take to the Cloud –> Oracle Applications Cloud 3. Hadoop-Based Data Reservoirs Unite with Data Warehouses –> Your Data Warehouse and Hadoop – […]

MapR Sandbox for Hadoop Learning

I got email about MapR Sandbox, that is a fully functional Hadoop cluster running on a virtual machine (CentOS 6.5) that provides an intuitive web interface for both developers and administrators to get started with Hadoop. I belief it's a good idea to learn about Hadoop and its ecosystem. Users can download for VMware VM or VirtualBox. I downloaded for VirtualBox and imported it. I changed about network to use "Bridged Adapter". After started... (more...)

NoSQL? No Thanks

There continues to be a disproportionate amount of hype around 'NoSQL' data stores.  By disproportionate I mean 'completely and utterly out of scale with the actual problems of the vast majority of companies'.  I wrote before about 'how NoSQL became more SQL'.  The point I made there is now more apparent the more I work with companies on Big Data challenges. There are three worlds of data

Big Data : Right Approach Right Solution

 

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article (more...)

Six things to make your Big Data project succeed

So I wrote about why your Hadoop project will fail so I think its only right that I should follow up with some things that you can do to actually make the Big Data project you take on succeed.  The first thing you need to do is stop trying to make 'Big Data' succeed and instead start focusing on how you educate the business on the value of information and then work out how to deliver new (more...)

Six reasons your Big Data Hadoop project will fail in 2014

Ok so Hadoop is the bomb, Hadoop is the schizzle, Hadoop is here to solve world hunger and all problems.  Now I've talked before about some of the challenges around Hadoop for enterprises but here are six reasons that Information Week is right when it says that Hadoop projects are going to fail more often than not. 1. Hadoop is a Java thing not a BI thing The first is the most important

2014 technology ambitions – just the two

As this the traditional time to layout resolutions for the year here are my 2 database related ones.

To understand more about some of the newer technologies and to advance my use of APEX as a means of providing reporting information around the systems and people I manage.

Having worked with the base Oracle RDBMS, and latterly SQL Server, for around 20 years I must admit that I am not as familiar with such terms (more...)

The Twelve Days of NoSQL: Day Ten: Big Data

On the tenth day of Christmas, my true love gave to me Ten lords a-leaping. The topic of Big Data is often encountered when talking about NoSQL so let’s give it a nod. In 1998, Sergey Brin and Larry Page invented an algorithm for ranking web pages (The Anatomy of a Large-Scale Hypertextual Web Search […]

The Twelve Days of NoSQL: Day Seven: Schemaless Design

On the seventh day of Christmas, my true love gave to me Seven swans a-swimming. As we discussed on Day One, NoSQL consists of “disruptive innovations” that are gaining steam and moving upmarket. So far, we have discussed functional segmentation (the pivotal innovation), sharding, asynchronous replication, eventual consistency (resulting from (more...)

My history with Big Data

Before I joined Cloudera, I hadn't had much formal experience with Big Data. But I had crossed paths with one of its major use cases before, so I found it easy to pick up the mindset. My previous big project involved a relational database hooked up to a web server. (more...)

How Business SOA thinking impacts data

Over the years I've written quite a bit about how SOA, when viewed as a tool for Business Architecture, can change some of the cherished beliefs in IT.  One of these was about how the Single Canonical Form was not for SOA and others have talked about how MDM and (more...)

How To Find Size Of Table In Hive / HDFS

| Nov 19, 2013

Hi All

Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the (more...)

RDBMS and their bundle-mates

Relational DBMS used to be fairly straightforward product suites, which boiled down to:

  • A big SQL interpreter.
  • A bunch of administrative and operational tools.
  • Some very optional add-ons, often including an application development tool.

Now, however, most RDBMS are sold as part of something bigger.

Comments on the 2013 Gartner Magic Quadrant for Operational Database Management Systems

The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is dubious term when systems omit strict consistency, and when even strictly consistent systems may (more...)

My Strata + Hadoop World Schedule

Next week is Strata + Hadoop World which is bound to be exciting for those who deal with big data on a daily basis.  I’ll be spending my time talking about Cloudera Impala at various places so I’m posting my schedule for those interesting in catching about fast SQL on (more...)

Hadoop Streaming, Hue, Oozie Workflows, and Hive

Elephant Painting

MapReduce with Hadoop Streaming in bash – Bonus!

MapReduce with Hadoop Streaming in bash – Part 3

Hadoop Streaming Bash

In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in (more...)

MapReduce with Hadoop Streaming in bash – Part 2

Hadoop Streaming Bash

In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce (more...)

TF-IDF with Hadoop Streaming in bash – Part 1

Hadoop Streaming Bash

So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming.

Hadoop Streaming allows you to write MapReduce code in any language that can process (more...)

Cloudera Certified Developer for Hadoop (CCDH)

Happy Hadoop

Taking the Cloudera Developer Training for Apache Hadoop had many rewards — one of which was a free voucher to take the CCD-410 Exam (normally $295) which you must pass to get CCDH certified. I’m not (more...)