Thoughts and notes, Thanksgiving weekend 2014

I’m taking a few weeks defocused from work, as a kind of grandpaternity leave. That said, the venue for my Dances of Infant Calming is a small-but-nice apartment in San Francisco, so a certain amount of thinking about tech industries is inevitable. I even found time last Tuesday to meet or speak with my clients at WibiData, MemSQL, Cloudera, Citus Data, and MongoDB. And thus:

1. I’ve been sloppy in my terminology around “geo-distribution”, in (more...)

Big Data… Is Hadoop the good way to start?

In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing. TL;TR You believe that you have a big data project? Do not start with the installation of an Hadoop Cluster -- the "how" Start to talk to business people to understand their problem -- the "why" Understand the data you must

HPC versus HDFS: Scientific versus Social

There have been rumblings from the HPC community indicating a general suspicion of and disdain for Big Data technology which would lead one to believe that whatever Google, Facebook and Twitter do with their supercomputers is not important enough to warrant seriousness—that social supercomputing is simply not worthy.  A little of this emotion seems to […]

An idealized log management and analysis system — from whom?

I’ve talked with many companies recently that believe they are:

  • Focused on building a great data management and analytic stack for log management …
  • … unlike all the other companies that might be saying the same thing :)
  • … and certainly unlike expensive, poorly-scalable Splunk …
  • … and also unlike less-focused vendors of analytic RDBMS (which are also expensive) and/or Hadoop distributions.

At best, I think such competitive claims are overwrought. Still, it’s a genuinely (more...)

The 7 Deadly Sins of Cloud Computing

As if anyone needs to be reminded, there’s a ridiculous amount of hype surrounding clouds and big data. There’s always oodles of hype around any new technology that is not well understood—I believe the correct term for this is product marketing. There are at least seven deadly sins that can be committed when determining a […]

The Big Data Job Gap: Where are the Platform Scientists?

Big data continues to live up to its reputation for disruption as it gnaws away at all of the entrenched constituencies – IT silos, vendors, pricing models and now, careers; it’s about to get very personal. On the possibility side, there is a massive skills gap that needs to be filled. Everybody knows there aren't […]

Oracle Data Integrator and Hadoop. Is ODI the only ETL tool for Big Data that works?

Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn’t it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?

In the ODI world this approach is known as ELT. ELT is a marketing concept (more...)

Teradata bought Hadapt and Revelytix

My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete (more...)

The point of predicate pushdown

Oracle is announcing today what it’s calling “Oracle Big Data SQL”. As usual, I haven’t been briefed, but highlights seem to include:

  • Oracle Big Data SQL is basically data federation using the External Tables capability of the Oracle DBMS.
  • Unlike independent products — e.g. Cirro — Oracle Big Data SQL federates SQL queries only across Oracle offerings, such as the Oracle DBMS, the Oracle NoSQL offering, or Oracle’s Cloudera-based Hadoop appliance.
  • Also unlike independent (more...)

21st Century DBMS success and failure

As part of my series on the keys to and likelihood of success, I outlined some examples from the DBMS industry. The list turned out too long for a single post, so I split it up by millennia. The part on 20th Century DBMS success and failure went up Friday; in this one I’ll cover more recent events, organized in line with the original overview post. Categories addressed will include analytic RDBMS (including data (more...)

Hadoop for Oracle Professionals Article on Oracle Scene

Oracle Scene (the publication of United Kingdom Oracle Users Group) has published my article "Hadoop for Oracle Professionals", where I have attempted, like many others, to demystify the terms such as Hadoop, Map/Reduce and Flume. If you were interested in Big Data and what all comes with understanding it, you might find it useful.

A PDF version of the article can be downloaded here http://www.proligence.com/art/oracle_scene_summ14_hadoop.pdf

Hadoop as a Cloud

The idea of clouds “meeting" big data or big data "living in" clouds isn’t simply marketing hype. Because big data followed so closely on the trend of cloud computing, both customers and vendors still struggle to understand the differences from their enterprise-centric perspectives. Everyone assumes that Hadoop can work in conventional clouds as easily as […]

More Than Just a Lake

Data lakes, like legacy storage arrays, are passive. They only hold data. Hadoop is an active reservoir, not a passive data lake. HDFS is a computational file system that can digest, filter and analyze any data in the reservoir, not just store it. HDFS is the only economically sustainable, computational file system in existence. Some […]

Hadoop is Not Merely a SQL EDW Platform

I was recently invited to speak about big data at the Rocky Mountain Oracle User's Group. I presentedto Oracle professionals who are faced with an onslaught of hype and mythology regarding Big Data in generaland Hadoop in particular. Most of the audience was familiar with the difficulty of attempting to engineer even Modest Data on […]

20th Century Stacks vs. 21st Century Stacks

In the 1960s, Bank of America and IBM built one of the first credit card processing systems. Although those early mainframes processed just a fraction of the data compared to that of eBay or Amazon, the engineering was complex for the day. Once credit cards became popular, processing systems had to be built to handle […]

How to select a Hadoop distro – stop thinking about Hadoop

Scoop, Flume, PIG, Zookeeper.  Do these mean anything to you?  If they do then the odds are you are looking at Hadoop.  The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually (more...)

Need for Defining Reference Architecture For Big Data

Hi Fellow Big Data Admirers ,

With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools. 

My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in (more...)

Data Lakes will replace EDWs – a prediction

Over the last few years there has been a trend of increased spending on BI, and that trend isn't going away.  The analyst predictions however have, understandably, been based on the mentality that the choice was between a traditional EDW/DW model or Hadoop.  With the new 'Business Data Lake' type of hybrid approach its pretty clear that the shift is underway for all vendors to have a hybrid

Big Data : Right Approach Right Solution

 

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article (more...)

My history with Big Data

Before I joined Cloudera, I hadn't had much formal experience with Big Data. But I had crossed paths with one of its major use cases before, so I found it easy to pick up the mindset. My previous big project involved a relational database hooked up to a web server. (more...)