On the tenth day of Christmas, my true love gave to me Ten lords a-leaping. The topic of Big Data is often encountered when talking about NoSQL so let’s give it a nod. In 1998, Sergey Brin and Larry Page invented an algorithm for ranking web pages (The Anatomy of a Large-Scale Hypertextual Web Search […]
Last few weeks I have been engaged with a customer, helping them them with remediation of Endeca project. During remediation, faced a typical challenge, where all the graphs and EQLs were erroring out. After doing some research found out that its a known issue . I spent good amount (more...)
Wishing you all readers!! a very happy new year. 2013 is over and dawn of 2014 has arrived. It just feel like yesterday and now we are here sitting and waiting for the year number to change. By the time I am writting blog, Australia, Mumbai and Dubai (more...)
On the seventh day of Christmas, my true love gave to me Seven swans a-swimming. As we discussed on Day One, NoSQL consists of “disruptive innovations” that are gaining steam and moving upmarket. So far, we have discussed functional segmentation (the pivotal innovation), sharding, asynchronous replication, eventual consistency (resulting from (more...)
There is an interesting article on Forbes where Paul Sonderegger from Oracle is making the case that you have to jump onto the “Big Data” bandwagon without delay if you want to avoid your big-data-using competitors crushing you.
But he would say that, wouldn’t he?
In reality, most companies already (more...)
When doing the Business Data Lake pieces it took me back to a view that I had around SOA in that you should take the consumers view when designing a service. This I think is more critical when looking at analytics and reporting where it really is all about the consumption.
Over the years I've written quite a bit about how SOA, when viewed as a tool for Business Architecture, can change some of the cherished beliefs in IT. One of these was about how the Single Canonical Form was not for SOA and others have talked about how MDM and (more...)
So what is Big Data? Its Variety, Velocity, Volume right? But what does that really mean? Should I get loads of data and drop it into Hadoop, pull in anything I can lay my hands on and I'm now 'doing Big Data'?
Should I plug in my packet monitoring software (more...)
Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the (more...)
Recently I have spending most of my time on Big Data projects,using CDH 4.X. Understanding key component of hadoop infrastruture is very necessary, But the MapReduce (MR) is the most important for processing and aggregrating data. For getting the best of the performance, one needs to know (more...)
[This entry is part 3 of 3 in the series Hadoop Streaming
In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in (more...)
[This entry is part 2 of 2 in the series Hadoop Streaming
In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce (more...)
[This entry is part 1 of 1 in the series Hadoop Streaming
So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming.
Hadoop Streaming allows you to write MapReduce code in any language that can process (more...)
With replication and fault tolerance, an inbuilt feature of Hadoop. I was always curious to know how blocks are replicated. Got this information while reading “Hadoop The Definitive Guide Edition – 3 ” in chapter 3 “The Hadoop Distributed Filesystem”. Thought would be interesting to share.
[This entry is part 5 of 5 in the series Cloudera Hadoop Training
Taking the Cloudera Developer Training for Apache Hadoop had many rewards — one of which was a free voucher to take the CCD-410 Exam (normally $295) which you must pass to get CCDH certified. I’m not (more...)
There are few things out there in IT more delusional than
the Single Canonical Form, the idea that IT can define a super schema, a schema
so complete, so pure that all will bow down before it. Sheer idolatry. Whether it is for integration or for Data
The Oracle guys running the Big Data 4 the Enterprise Meetup
are always apologetic about marketing. The novelty is quite amusing. They do this because most Big Data Meetups are full of brash young people from small start-ups who use cool open source software. They choose cool open source software (more...)
Out of the box hadoop provides a benchmarking mechanism for your cluster. While doing the same on Cloudera cluster, it was a fun ride, hence thought will share the same to reduce the pain and increase the fun.
Before you begin anything, set the HADOOP_HOME.The below command (more...)
While while running a simple
beeline > select count(*) from samvi_test_table;
Got the following error.
Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker. (more...)