Need for Defining Reference Architecture For Big Data

Hi Fellow Big Data Admirers ,

With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools. 

My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in (more...)

Data Lakes will replace EDWs – a prediction

Over the last few years there has been a trend of increased spending on BI, and that trend isn't going away.  The analyst predictions however have, understandably, been based on the mentality that the choice was between a traditional EDW/DW model or Hadoop.  With the new 'Business Data Lake' type of hybrid approach its pretty clear that the shift is underway for all vendors to have a hybrid

MapR Sandbox for Hadoop Learning

I got email about MapR Sandbox, that is a fully functional Hadoop cluster running on a virtual machine (CentOS 6.5) that provides an intuitive web interface for both developers and administrators to get started with Hadoop. I belief it's a good idea to learn about Hadoop and its ecosystem. Users can download for VMware VM or VirtualBox. I downloaded for VirtualBox and imported it. I changed about network to use "Bridged Adapter". After started... (more...)

NoSQL? No Thanks

There continues to be a disproportionate amount of hype around 'NoSQL' data stores.  By disproportionate I mean 'completely and utterly out of scale with the actual problems of the vast majority of companies'.  I wrote before about 'how NoSQL became more SQL'.  The point I made there is now more apparent the more I work with companies on Big Data challenges. There are three worlds of data

Big Data : Right Approach Right Solution

 

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article (more...)

My history with Big Data

Before I joined Cloudera, I hadn't had much formal experience with Big Data. But I had crossed paths with one of its major use cases before, so I found it easy to pick up the mindset. My previous big project involved a relational database hooked up to a web server. (more...)

RDBMS and their bundle-mates

Relational DBMS used to be fairly straightforward product suites, which boiled down to:

  • A big SQL interpreter.
  • A bunch of administrative and operational tools.
  • Some very optional add-ons, often including an application development tool.

Now, however, most RDBMS are sold as part of something bigger.

Comments on the 2013 Gartner Magic Quadrant for Operational Database Management Systems

The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is dubious term when systems omit strict consistency, and when even strictly consistent systems may (more...)

My Strata + Hadoop World Schedule

Next week is Strata + Hadoop World which is bound to be exciting for those who deal with big data on a daily basis.  I’ll be spending my time talking about Cloudera Impala at various places so I’m posting my schedule for those interesting in catching about fast SQL on (more...)

New York Oracle User Group Fall Conference Materials

Thank you all who attended my sessions at NYOUG Fall Conference this morning. I appreciate spending you most precious commodity - your time - with me. I sincerely hope you found both the presentations enlightening as well as entertaining.

Please see the details of the sessions below along with the (more...)

Upcoming Talks: OakTable World and Strata + Hadoop World

I haven’t had much time over the past year to do many blog posts, but in the next few months I’ll be doing a few talks about what I’ve been working on over that time, Cloudera Impala, an Open Source MPP SQL query engine for Hadoop.  Hope to see (more...)

Oracle Corp at useR! Conference 2013 #useR2013 #rstats

This year’s R User Conference happened in Albacete (Spain), gathering R professionals and enthusiasts all over the world since 2004, when it first began in Vienna. The sponsors this year were  REvolution analytics, Google, R-Studio, Oracle, and TIBCO. Other companies like OpenAnalytics and Mango Solutions were also present with a booth stand. Besides sponsoring the (more...)

The 3 ways Hadoop will change your Business Intelligence

“It’s the analytics stupid!” Obviously the offense is not intended at the dear reader. It’s a wake up call for all the people excited with Hadoop and lack BI vision. The BI people that lack infrastructure vision are also to blame. Blame for what? We’ll see later in this (more...)

InfoQ : Running the Largest Hadoop DFS Cluster

Since I joined a Big Data Event : Frankfurter Datenbanktage 2013 - I started to take also a look to non-relational technics too. The RDBMS is not for every asepct the correct and fitting and fulfilling answer to all data related IT challenges. 

Frequently I wondered about how facebook (more...)

What’s all the fuss about Big Data?

Uncategorized
| Mar 6, 2013

What’s all the fuss about Big Data?


Big Data is the collective term for very large and potentially complex data sets that are deemed to be so large that it’s difficult to handle the data using traditional tools and applications such as Relational Database Management Systems. Scientists in the fields of physics, genetics and meteorology were previous examples of those that encountered Big Data.

 

However,

Everything you ever wanted to know about Big Data, but had no PDF to carry around!

Back in March 2012 I experienced an air milage overflow: almost straight from Madrid I’ve picked a flight to Israel to speak at a Big Data conference, only to be back in Lisbon and fly again to Johannesburg in South Africa to meet several customers in the retail and manufacturing area. Back to Lisbon I packed again to London [...]

Hadoop! What is it good for? Absolutely … everything!

In times of hysteria people tend to use their reptilian brain. This sub-brain, that has been with us since we were fish, or tadpoles, it’s what kicks in when we face the unknown. In computer science or information technology, organizations tend to hold down to emotions and less and less in reasoning. Could it be [...]

Comic: How to write CV for NoSQL

Original Post can be viewed at Comic: How to write CV for NoSQL

This is pretty old comic from geek&Poke . Enjoy   Related PostsLife Is Changed Now!!!!!Oracle Direct connector for HDFSWishing all a hApPy DiWaLi10,000 Hits – First MilestoneChecking Database Feature Usage StatsZemanta

AskDba.org Weblog

Coming Out


The last 16 years or so of my professional life have been dedicated to working on problems (and solutions!) in transactional middleware - by this, I mean systems that provide strong consistency guarantees: reliable queueing, distributed 2pc engines, higher level quality of service guarantees for lower level protocols (iiop, (more...)

Linux 6 Transparent Huge Pages and Hadoop Workloads

This past week I spent some time setting up and running various Hadoop workloads on my CDH cluster. After some Hadoop jobs had been running for several minutes, I noticed something quite alarming — the system CPU percentages where extremely high.

Platform Details

This cluster is comprised of 2s8c16t Xeon L5630 nodes with 96 GB of RAM running CentOS Linux 6.2 with java 1.6.0_30. The details of those are:

$ cat /etc/redhat-release
CentOS release 6.2 (Final)

$ uname -a
Linux chaos 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012  (more...)