There continues to be a disproportionate amount of hype around 'NoSQL' data stores. By disproportionate I mean 'completely and utterly out of scale with the actual problems of the vast majority of companies'. I wrote before about 'how NoSQL became more SQL'. The point I made there is now more apparent the more I work with companies on Big Data challenges.
There are three worlds of data
Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.
Click Here To Read Full Article (more...)
So I wrote about why your Hadoop project will fail so I think its only right that I should follow up with some things that you can do to actually make the Big Data project you take on succeed. The first thing you need to do is stop trying to make 'Big Data' succeed and instead start focusing on how you educate the business on the value of information and then work out how to deliver new (more...)
Ok so Hadoop is the bomb, Hadoop is the schizzle, Hadoop is here to solve world hunger and all problems. Now I've talked before about some of the challenges around Hadoop for enterprises but here are six reasons that Information Week is right when it says that Hadoop projects are going to fail more often than not.
1. Hadoop is a Java thing not a BI thing
The first is the most important
As this the traditional time to layout resolutions for the year here are my 2 database related ones.
To understand more about some of the newer technologies and to advance my use of APEX as a means of providing reporting information around the systems and people I manage.
Having worked with the base Oracle RDBMS, and latterly SQL Server, for around 20 years I must admit that I am not as familiar with such terms (more...)
On the tenth day of Christmas, my true love gave to me Ten lords a-leaping. The topic of Big Data is often encountered when talking about NoSQL so let’s give it a nod. In 1998, Sergey Brin and Larry Page invented an algorithm for ranking web pages (The Anatomy of a Large-Scale Hypertextual Web Search […]
On the seventh day of Christmas, my true love gave to me Seven swans a-swimming. As we discussed on Day One, NoSQL consists of “disruptive innovations” that are gaining steam and moving upmarket. So far, we have discussed functional segmentation (the pivotal innovation), sharding, asynchronous replication, eventual consistency (resulting from (more...)
Before I joined Cloudera, I hadn't had much formal experience with Big Data. But I had crossed paths with one of its major use cases before, so I found it easy to pick up the mindset.
My previous big project involved a relational database hooked up to a web server. (more...)
Over the years I've written quite a bit about how SOA, when viewed as a tool for Business Architecture, can change some of the cherished beliefs in IT. One of these was about how the Single Canonical Form was not for SOA and others have talked about how MDM and (more...)
Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the (more...)
Relational DBMS used to be fairly straightforward product suites, which boiled down to:
- A big SQL interpreter.
- A bunch of administrative and operational tools.
- Some very optional add-ons, often including an application development tool.
Now, however, most RDBMS are sold as part of something bigger.
The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is dubious term when systems omit strict consistency, and when even strictly consistent systems may (more...)
Recently I have spending most of my time on Big Data projects,using CDH 4.X. Understanding key component of hadoop infrastruture is very necessary, But the MapReduce (MR) is the most important for processing and aggregrating data. For getting the best of the performance, one needs to know (more...)
Next week is Strata + Hadoop World which is bound to be exciting for those who deal with big data on a daily basis. I’ll be spending my time talking about Cloudera Impala at various places so I’m posting my schedule for those interesting in catching about fast SQL on (more...)
[This entry is part 3 of 3 in the series Hadoop Streaming
In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in (more...)
[This entry is part 2 of 2 in the series Hadoop Streaming
In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce (more...)
[This entry is part 1 of 1 in the series Hadoop Streaming
So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming.
Hadoop Streaming allows you to write MapReduce code in any language that can process (more...)
With replication and fault tolerance, an inbuilt feature of Hadoop. I was always curious to know how blocks are replicated. Got this information while reading “Hadoop The Definitive Guide Edition – 3 ” in chapter 3 “The Hadoop Distributed Filesystem”. Thought would be interesting to share.
[This entry is part 5 of 5 in the series Cloudera Hadoop Training
Taking the Cloudera Developer Training for Apache Hadoop had many rewards — one of which was a free voucher to take the CCD-410 Exam (normally $295) which you must pass to get CCDH certified. I’m not (more...)