Technology preview – Oracle XQuery for Hadoop (New Big Data Connector)

Yesterday I went to the Big Data machine engineered systems demo grounds, to get an insight, exclusive demo from Dmitry Lychagin. Dmitry, being part of the XDB team explained with a lot of enthusiasm the new XQuery connector for Hadoop. Although currently not yet released, but available shortly, he demonstrated (more...)

Spring JDBC with PivotalHD and Hawq

HAWQ enables SQL for Hadoop ensuring we can use something like Spring JDBC as shown below. In this example we use the PivotalHD VM with data from a HAWQ append only table as shown below.

  
gpadmin=# \dt
                             List of relations
   Schema    |            Name             | Type  |  Owner  |   Storage   
-------------+-----------------------------+-------+---------+-------------
  (more...)

New York Oracle User Group Fall Conference Materials

Thank you all who attended my sessions at NYOUG Fall Conference this morning. I appreciate spending you most precious commodity - your time - with me. I sincerely hope you found both the presentations enlightening as well as entertaining.

Please see the details of the sessions below along with the (more...)

Cloudera Sentry and other security subjects

I chatted with Charles Zedlewski of Cloudera on Thursday about security — especially Cloudera’s new offering Sentry — and other Hadoop subjects.

Sentry is:

  • Developed by Cloudera.
  • An Apache incubator project.
  • Slated to be rolled into CDH — Cloudera’s Hadoop distribution — over the next couple of weeks.
  • Only useful (more...)

Upcoming Talks: OakTable World and Strata + Hadoop World

I haven’t had much time over the past year to do many blog posts, but in the next few months I’ll be doing a few talks about what I’ve been working on over that time, Cloudera Impala, an Open Source MPP SQL query engine for Hadoop.  Hope to see (more...)

“Disruption” in the software industry

I lampoon the word “disruptive” for being badly overused. On the other hand, I often refer to the concept myself. Perhaps I should clarify. :)

You probably know that the modern concept of disruption comes from Clayton Christensen, specifically in The Innovator’s Dilemma and its sequel, The Innovator’s Solution. The basic (more...)

Minimum on the wire, everything in the record

I've talked before about why large canonical models are a bad idea and how MDM makes SOA, BPM and a whole lot of things easier.  This philosophy of 'minimum on the wire' helps to create more robust infrastructures that don't suffer from a fragile base class problem and better match (more...)

Google and Yahoo have it easy or why Hadoop is only part of the story

We hear lots and lots of hype at the moment around Hadoop, and it is a great technology approach, but there is also lots of talk about how this approach will win because Google and Yahoo are using it to manage their scale and thus this shows that their approach (more...)

Oracle Corp at useR! Conference 2013 #useR2013 #rstats

This year’s R User Conference happened in Albacete (Spain), gathering R professionals and enthusiasts all over the world since 2004, when it first began in Vienna. The sponsors this year were  REvolution analytics, Google, R-Studio, Oracle, and TIBCO. Other companies like OpenAnalytics and Mango Solutions were also present with a booth stand. Besides sponsoring the (more...)

Demystifying Big Data for Oracle Professionals

Ever wonder about Big Data and what exactly it means, especially if you are already an Oracle Database professional? Or, do you get lost in the jargon warfare that spews out terms like Hadoop, Map/Reduce and HDFS? In this post I will attempt to explain these terms from the perspective (more...)

The Hadoop hump – why enterprises struggle to move from Proof of Concept to Enterprise deployment

At the recent Hadoop Summit in Amsterdam I noticed something that has been bothering me for a while.  Lots of companies have done some great Proof of Concepts with Hadoop but they are rarely turning those into fully blown operational solutions.  Being clear I'm not talking about the shiny, shiny (more...)

The 3 ways Hadoop will change your Business Intelligence

“It’s the analytics stupid!” Obviously the offense is not intended at the dear reader. It’s a wake up call for all the people excited with Hadoop and lack BI vision. The BI people that lack infrastructure vision are also to blame. Blame for what? We’ll see later in this (more...)

InfoQ : Running the Largest Hadoop DFS Cluster

Since I joined a Big Data Event : Frankfurter Datenbanktage 2013 - I started to take also a look to non-relational technics too. The RDBMS is not for every asepct the correct and fitting and fulfilling answer to all data related IT challenges. 

Frequently I wondered about how facebook (more...)

What’s all the fuss about Big Data?

Uncategorized
| Mar 6, 2013

What’s all the fuss about Big Data?


Big Data is the collective term for very large and potentially complex data sets that are deemed to be so large that it’s difficult to handle the data using traditional tools and applications such as Relational Database Management Systems. Scientists in the fields of physics, genetics and meteorology were previous examples of those that encountered Big Data.

 

However,

Everything you ever wanted to know about Big Data, but had no PDF to carry around!

Back in March 2012 I experienced an air milage overflow: almost straight from Madrid I’ve picked a flight to Israel to speak at a Big Data conference, only to be back in Lisbon and fly again to Johannesburg in South Africa to meet several customers in the retail and manufacturing area. Back to Lisbon I packed again to London [...]

Hadoop! What is it good for? Absolutely … everything!

In times of hysteria people tend to use their reptilian brain. This sub-brain, that has been with us since we were fish, or tadpoles, it’s what kicks in when we face the unknown. In computer science or information technology, organizations tend to hold down to emotions and less and less in reasoning. Could it be [...]

Comic: How to write CV for NoSQL

Original Post can be viewed at Comic: How to write CV for NoSQL

This is pretty old comic from geek&Poke . Enjoy   Related PostsLife Is Changed Now!!!!!Oracle Direct connector for HDFSWishing all a hApPy DiWaLi10,000 Hits – First MilestoneChecking Database Feature Usage StatsZemanta

AskDba.org Weblog

Coming Out


The last 16 years or so of my professional life have been dedicated to working on problems (and solutions!) in transactional middleware - by this, I mean systems that provide strong consistency guarantees: reliable queueing, distributed 2pc engines, higher level quality of service guarantees for lower level protocols (iiop, (more...)

Linux 6 Transparent Huge Pages and Hadoop Workloads

This past week I spent some time setting up and running various Hadoop workloads on my CDH cluster. After some Hadoop jobs had been running for several minutes, I noticed something quite alarming — the system CPU percentages where extremely high.

Platform Details

This cluster is comprised of 2s8c16t Xeon L5630 nodes with 96 GB of RAM running CentOS Linux 6.2 with java 1.6.0_30. The details of those are:

$ cat /etc/redhat-release
CentOS release 6.2 (Final)

$ uname -a
Linux chaos 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012  (more...)

Big Data


The other day I watched the Oracle Big Data forum. Now available here. A half-day event with various speakers on the subject of BigData, including Tom Kyte , a mentor who I admire!

In the forum they have gone over Oracle's approach to Big Data and allow me to summarise it below:
  1. Acquire - Collect Big Data, identify it, where is it? Then store it in Oracle NoSQL - a value-pair database

  2. Organise - Stage Big Data in a transient elastic database. Using Oracle Data Integrator and the Oracle Hadoop connector, reduce and distil it.

  3. Analyse - Start Analytics on (more...)