Over the years I've written quite a bit about how SOA, when viewed as a tool for Business Architecture, can change some of the cherished beliefs in IT. One of these was about how the Single Canonical Form was not for SOA and others have talked about how MDM and (more...)
So what is Big Data? Its Variety, Velocity, Volume right? But what does that really mean? Should I get loads of data and drop it into Hadoop, pull in anything I can lay my hands on and I'm now 'doing Big Data'?
Should I plug in my packet monitoring software (more...)
Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the (more...)
Over the past few months I’ve been posting a number of articles about Hadoop, and how you can connect to it from ODI and OBIEE. From an ODI perspective, I covered Hadoop as one of a number of new data sources ODI11g could connect to, then looked at how (more...)
Oracle OpenWorld is a monster event – 10Ks of attendees, thousands of sessions and 100Ks of private conversations that all help convey and define the message about Oracle’s strategy and the roadmap for its close to 4000 thousand products. Concurrently with OOW is the JavaOne conference that – al a (more...)
The other day I posted an article on the blog about connecting OBIEE 22.214.171.124 to Cloudera Impala, a new “in-memory” SQL engine for Hadoop that’s much faster than Hive for interactive queries. In this example, I connected OBIEE 126.96.36.199 to the Cloudera Quickstart (more...)
A few months ago I posted a series of articles about connecting OBIEE 188.8.131.52, Exalytics and ODI to Apache Hadoop through Hive, an SQL-interface layer for Hadoop. Hadoop/Hive connectivity is a cool new feature in OBIEE 11g but suffers from the problem common to Hive (more...)
“The devil can cite Scripture for his purpose. An evil soul producing holy witness Is like a villain with a smiling cheek, A goodly apple rotten at the heart: O, what a goodly outside falsehood hath!” —from The Merchant of Venice by William Shakespeare Dr. Codd’s support for non-relational (more...)
Recently I have spending most of my time on Big Data projects,using CDH 4.X. Understanding key component of hadoop infrastruture is very necessary, But the MapReduce (MR) is the most important for processing and aggregrating data. For getting the best of the performance, one needs to know (more...)
[This entry is part 3 of 3 in the series Hadoop Streaming
In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in (more...)
[This entry is part 2 of 2 in the series Hadoop Streaming
In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce (more...)
[This entry is part 1 of 1 in the series Hadoop Streaming
So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming.
Hadoop Streaming allows you to write MapReduce code in any language that can process (more...)
With replication and fault tolerance, an inbuilt feature of Hadoop. I was always curious to know how blocks are replicated. Got this information while reading “Hadoop The Definitive Guide Edition – 3 ” in chapter 3 “The Hadoop Distributed Filesystem”. Thought would be interesting to share.
[This entry is part 5 of 5 in the series Cloudera Hadoop Training
Taking the Cloudera Developer Training for Apache Hadoop had many rewards — one of which was a free voucher to take the CCD-410 Exam (normally $295) which you must pass to get CCDH certified. I’m not (more...)
Here is the poll data from the Confio-sponsored webinar “NoSQL and Big Data for the Oracle DBA” and my answers to the questions asked in the chat. The recording of the webinar is now available at http://www.confio.com/webinars/nosql-big-data/. The slide deck is at http://iggyfernandez.files.wordpress.com/2013/10/nosql-and-big-data-for-oracle-dbas-oct-2013.pdf. Are (more...)
There are few things out there in IT more delusional than
the Single Canonical Form, the idea that IT can define a super schema, a schema
so complete, so pure that all will bow down before it. Sheer idolatry. Whether it is for integration or for Data
The Oracle guys running the Big Data 4 the Enterprise Meetup
are always apologetic about marketing. The novelty is quite amusing. They do this because most Big Data Meetups are full of brash young people from small start-ups who use cool open source software. They choose cool open source software (more...)
Out of the box hadoop provides a benchmarking mechanism for your cluster. While doing the same on Cloudera cluster, it was a fun ride, hence thought will share the same to reduce the pain and increase the fun.
Before you begin anything, set the HADOOP_HOME.The below command (more...)
In my webinar for Confio on October 10, I will explain that the deficiencies of relational technology are actually a result of deliberate choices made by the relational movement in its early years. The relational camp needs to revisit these choices if it wants to compete with NoSQL and Big (more...)