Oracle Data Integrator and Hadoop. Is ODI the only ETL tool for Big Data that works?

Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn’t it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?

In the ODI world this approach is known as ELT. ELT is a marketing concept (more...)

Why Oracle Big Data SQL Potentially Solves a Big Issue with Hadoop Security

Oracle announced their Big Data SQL product a couple of weeks ago, which effectively extends Exadata’s query-offloading to Hadoop data sources. I covered the launch a few days afterwards, focusing on how it implements Exadata’s SmartScan on Hive and NoSQL data sources and provides a single metadata catalog over both relational, and Hadoop, data sources. In a Twitter conversation later in the day though, I made the comment that in my opinion, the biggest (more...)

Teradata bought Hadapt and Revelytix

My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete (more...)

Taking a Look at the New Oracle Big Data SQL

Oracle launched their Oracle Big Data SQL product earlier this week, and it’ll be of interest to anyone who saw our series of posts a few weeks ago about the updated Oracle Information Management Reference Architecture, where Hadoop now sits alongside traditional Oracle data warehouses to provide what’s termed a “data reservoir”. In this type of architecture, Hadoop and its underlying technologies HDFS, Hive and schema-on-read databases provide an extension to the more structured (more...)

War of the Hadoop SQL engines. And the winner is …?

You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we will update our website with our new offerings, products, and services. The article below summarises (more...)

The point of predicate pushdown

Oracle is announcing today what it’s calling “Oracle Big Data SQL”. As usual, I haven’t been briefed, but highlights seem to include:

  • Oracle Big Data SQL is basically data federation using the External Tables capability of the Oracle DBMS.
  • Unlike independent products — e.g. Cirro — Oracle Big Data SQL federates SQL queries only across Oracle offerings, such as the Oracle DBMS, the Oracle NoSQL offering, or Oracle’s Cloudera-based Hadoop appliance.
  • Also unlike independent (more...)

21st Century DBMS success and failure

As part of my series on the keys to and likelihood of success, I outlined some examples from the DBMS industry. The list turned out too long for a single post, so I split it up by millennia. The part on 20th Century DBMS success and failure went up Friday; in this one I’ll cover more recent events, organized in line with the original overview post. Categories addressed will include analytic RDBMS (including data (more...)

Online product recommendations using Hadoop

Online product recommendations using Hadoop

One of the leading portals on BigData, Dataconomy, had an interview with a colleague of mine on product recommendations systems. These are systems aimed towards personalizing content and recommending the ‘right’ products, in other words products that inspire customers. The article – The Science Behind the Finding the Perfect Product – is a nice read that covers quite some areas.

Stack of Hadoop nodesAt bol.com we use Hadoop for batches, and (more...)

Hadoop for Oracle Professionals Article on Oracle Scene

Oracle Scene (the publication of United Kingdom Oracle Users Group) has published my article "Hadoop for Oracle Professionals", where I have attempted, like many others, to demystify the terms such as Hadoop, Map/Reduce and Flume. If you were interested in Big Data and what all comes with understanding it, you might find it useful.

A PDF version of the article can be downloaded here http://www.proligence.com/art/oracle_scene_summ14_hadoop.pdf

Hadoop as a Cloud

The idea of clouds “meeting" big data or big data "living in" clouds isn’t simply marketing hype. Because big data followed so closely on the trend of cloud computing, both customers and vendors still struggle to understand the differences from their enterprise-centric perspectives. Everyone assumes that Hadoop can work in conventional clouds as easily as […]

More Than Just a Lake

Data lakes, like legacy storage arrays, are passive. They only hold data. Hadoop is an active reservoir, not a passive data lake. HDFS is a computational file system that can digest, filter and analyze any data in the reservoir, not just store it. HDFS is the only economically sustainable, computational file system in existence. Some […]

Hadoop is Not Merely a SQL EDW Platform

I was recently invited to speak about big data at the Rocky Mountain Oracle User's Group. I presentedto Oracle professionals who are faced with an onslaught of hype and mythology regarding Big Data in generaland Hadoop in particular. Most of the audience was familiar with the difficulty of attempting to engineer even Modest Data on […]

20th Century Stacks vs. 21st Century Stacks

In the 1960s, Bank of America and IBM built one of the first credit card processing systems. Although those early mainframes processed just a fraction of the data compared to that of eBay or Amazon, the engineering was complex for the day. Once credit cards became popular, processing systems had to be built to handle […]

How to select a Hadoop distro – stop thinking about Hadoop

Scoop, Flume, PIG, Zookeeper.  Do these mean anything to you?  If they do then the odds are you are looking at Hadoop.  The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually (more...)

RM BI Forum 2014 Notes – Cloudera Hadoop Masterclass

The Rittman Mead BI Forum started off with a one-day Hadoop Masterclass, provided by Lars George.  As he messaged us the day before we have learned what Hadoop is all about, what its major components are, how to acquire, processes and provide data as part of a production data processing pipeline. To that effect, Lars advised that […]

Need for Defining Reference Architecture For Big Data

Hi Fellow Big Data Admirers ,

With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools. 

My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in (more...)

Data Lakes will replace EDWs – a prediction

Over the last few years there has been a trend of increased spending on BI, and that trend isn't going away.  The analyst predictions however have, understandably, been based on the mentality that the choice was between a traditional EDW/DW model or Hadoop.  With the new 'Business Data Lake' type of hybrid approach its pretty clear that the shift is underway for all vendors to have a hybrid

MapR Sandbox for Hadoop Learning

I got email about MapR Sandbox, that is a fully functional Hadoop cluster running on a virtual machine (CentOS 6.5) that provides an intuitive web interface for both developers and administrators to get started with Hadoop. I belief it's a good idea to learn about Hadoop and its ecosystem. Users can download for VMware VM or VirtualBox. I downloaded for VirtualBox and imported it. I changed about network to use "Bridged Adapter". After started... (more...)

NoSQL? No Thanks

There continues to be a disproportionate amount of hype around 'NoSQL' data stores.  By disproportionate I mean 'completely and utterly out of scale with the actual problems of the vast majority of companies'.  I wrote before about 'how NoSQL became more SQL'.  The point I made there is now more apparent the more I work with companies on Big Data challenges. There are three worlds of data

Big Data : Right Approach Right Solution

 

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article (more...)