Before I joined Cloudera, I hadn't had much formal experience with Big Data. But I had crossed paths with one of its major use cases before, so I found it easy to pick up the mindset.
My previous big project involved a relational database hooked up to a web server. (more...)
Over the years I've written quite a bit about how SOA, when viewed as a tool for Business Architecture, can change some of the cherished beliefs in IT. One of these was about how the Single Canonical Form was not for SOA and others have talked about how MDM and (more...)
Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the (more...)
Relational DBMS used to be fairly straightforward product suites, which boiled down to:
- A big SQL interpreter.
- A bunch of administrative and operational tools.
- Some very optional add-ons, often including an application development tool.
Now, however, most RDBMS are sold as part of something bigger.
The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is dubious term when systems omit strict consistency, and when even strictly consistent systems may (more...)
Recently I have spending most of my time on Big Data projects,using CDH 4.X. Understanding key component of hadoop infrastruture is very necessary, But the MapReduce (MR) is the most important for processing and aggregrating data. For getting the best of the performance, one needs to know (more...)
Next week is Strata + Hadoop World which is bound to be exciting for those who deal with big data on a daily basis. I’ll be spending my time talking about Cloudera Impala at various places so I’m posting my schedule for those interesting in catching about fast SQL on (more...)
[This entry is part 3 of 3 in the series Hadoop Streaming
In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in (more...)
[This entry is part 2 of 2 in the series Hadoop Streaming
In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce (more...)
[This entry is part 1 of 1 in the series Hadoop Streaming
So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming.
Hadoop Streaming allows you to write MapReduce code in any language that can process (more...)
With replication and fault tolerance, an inbuilt feature of Hadoop. I was always curious to know how blocks are replicated. Got this information while reading “Hadoop The Definitive Guide Edition – 3 ” in chapter 3 “The Hadoop Distributed Filesystem”. Thought would be interesting to share.
[This entry is part 5 of 5 in the series Cloudera Hadoop Training
Taking the Cloudera Developer Training for Apache Hadoop had many rewards — one of which was a free voucher to take the CCD-410 Exam (normally $295) which you must pass to get CCDH certified. I’m not (more...)
Here is the poll data from the Confio-sponsored webinar “NoSQL and Big Data for the Oracle DBA” and my answers to the questions asked in the chat. The recording of the webinar is now available at http://www.confio.com/webinars/nosql-big-data/. The slide deck is at http://iggyfernandez.files.wordpress.com/2013/10/nosql-and-big-data-for-oracle-dbas-oct-2013.pdf. Are (more...)
Out of the box hadoop provides a benchmarking mechanism for your cluster. While doing the same on Cloudera cluster, it was a fun ride, hence thought will share the same to reduce the pain and increase the fun.
Before you begin anything, set the HADOOP_HOME.The below command (more...)
In my webinar for Confio on October 10, I will explain that the deficiencies of relational technology are actually a result of deliberate choices made by the relational movement in its early years. The relational camp needs to revisit these choices if it wants to compete with NoSQL and Big (more...)
While while running a simple
beeline > select count(*) from samvi_test_table;
Got the following error.
Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker. (more...)
Reblogged from So Many Oracle Manuals, So Little Time: See also : No! to SQL and No! to NoSQL The inventor of the relational model, Dr. Edgar “Ted” Codd believed that the suppression of physical database design details was the chief advantage of the relational model. He made the case (more...)
[This entry is part 4 of 4 in the series Cloudera Hadoop Training
This will be a short(ish) post, as my brain is relatively fried from the obscene amount of knowledge imparted by the Hadoop class (and it’s Friday so I’m allowed to be lazy nanny nanny boo boo).
Yesterday I went to the Big Data machine engineered systems demo grounds, to get an insight, exclusive demo from Dmitry Lychagin. Dmitry, being part of the XDB team explained with a lot of enthusiasm the new XQuery connector for Hadoop. Although currently not yet released, but available shortly, he demonstrated (more...)