I’ve talked with many companies recently that believe they are:
- Focused on building a great data management and analytic stack for log management …
- … unlike all the other companies that might be saying the same thing …
- … and certainly unlike expensive, poorly-scalable Splunk …
- … and also unlike less-focused vendors of analytic RDBMS (which are also expensive) and/or Hadoop distributions.
At best, I think such competitive claims are overwrought. Still, it’s a genuinely (more...)
As if anyone needs to be reminded, there’s a ridiculous amount of hype surrounding clouds and big data. There’s always oodles of hype around any new technology that is not well understood—I believe the correct term for this is product marketing. There are at least seven deadly sins that can be committed when determining a […]
Big data continues to live up to its reputation for disruption as it gnaws away at all of the entrenched constituencies – IT silos, vendors, pricing models and now, careers; it’s about to get very personal. On the possibility side, there is a massive skills gap that needs to be filled. Everybody knows there aren't […]
Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn’t it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?
In the ODI world this approach is known as ELT. ELT is a marketing concept (more...)
My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete (more...)
You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we will update our website with our new offerings, products, and services. The article below summarises (more...)
Oracle is announcing today what it’s calling “Oracle Big Data SQL”. As usual, I haven’t been briefed, but highlights seem to include:
- Oracle Big Data SQL is basically data federation using the External Tables capability of the Oracle DBMS.
- Unlike independent products — e.g. Cirro — Oracle Big Data SQL federates SQL queries only across Oracle offerings, such as the Oracle DBMS, the Oracle NoSQL offering, or Oracle’s Cloudera-based Hadoop appliance.
- Also unlike independent (more...)
As part of my series on the keys to and likelihood of success, I outlined some examples from the DBMS industry. The list turned out too long for a single post, so I split it up by millennia. The part on 20th Century DBMS success and failure went up Friday; in this one I’ll cover more recent events, organized in line with the original overview post. Categories addressed will include analytic RDBMS (including data (more...)
Online product recommendations using Hadoop
One of the leading portals on BigData, Dataconomy, had an interview with a colleague of mine on product recommendations systems. These are systems aimed towards personalizing content and recommending the ‘right’ products, in other words products that inspire customers. The article – The Science Behind the Finding the Perfect Product – is a nice read that covers quite some areas.
At bol.com we use Hadoop for batches, and (more...)
Oracle Scene (the publication of United Kingdom Oracle Users Group) has published my article "Hadoop for Oracle Professionals", where I have attempted, like many others, to demystify the terms such as Hadoop, Map/Reduce and Flume. If you were interested in Big Data and what all comes with understanding it, you might find it useful.
A PDF version of the article can be downloaded here http://www.proligence.com/art/oracle_scene_summ14_hadoop.pdf
The idea of clouds “meeting" big data or big data "living in" clouds isn’t simply marketing hype. Because big data followed so closely on the trend of cloud computing, both customers and vendors still struggle to understand the differences from their enterprise-centric perspectives. Everyone assumes that Hadoop can work in conventional clouds as easily as […]
Data lakes, like legacy storage arrays, are passive. They only hold data. Hadoop is an active reservoir, not a passive data lake. HDFS is a computational file system that can digest, filter and analyze any data in the reservoir, not just store it. HDFS is the only economically sustainable, computational file system in existence. Some […]
I was recently invited to speak about big data at the Rocky Mountain Oracle User's Group. I presentedto Oracle professionals who are faced with an onslaught of hype and mythology regarding Big Data in generaland Hadoop in particular. Most of the audience was familiar with the difficulty of attempting to engineer even Modest Data on […]
In the 1960s, Bank of America and IBM built one of the first credit card processing systems. Although those early mainframes processed just a fraction of the data compared to that of eBay or Amazon, the engineering was complex for the day. Once credit cards became popular, processing systems had to be built to handle […]
Scoop, Flume, PIG, Zookeeper. Do these mean anything to you? If they do then the odds are you are looking at Hadoop. The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually (more...)
The Rittman Mead BI Forum started off with a one-day Hadoop Masterclass, provided by Lars George. As he messaged us the day before we have learned what Hadoop is all about, what its major components are, how to acquire, processes and provide data as part of a production data processing pipeline. To that effect, Lars advised that […]
Hi Fellow Big Data Admirers ,
With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools.
My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in (more...)
Over the last few years there has been a trend of increased spending on BI, and that trend isn't going away. The analyst predictions however have, understandably, been based on the mentality that the choice was between a traditional EDW/DW model or Hadoop. With the new 'Business Data Lake' type of hybrid approach its pretty clear that the shift is underway for all vendors to have a hybrid
I got email about MapR Sandbox
, that is a fully functional Hadoop cluster running on a virtual machine (CentOS 6.5) that provides an intuitive web interface for both developers and administrators to get started with Hadoop. I belief it's a good idea to learn about Hadoop and its ecosystem. Users can download for VMware VM or VirtualBox. I downloaded for VirtualBox and imported it. I changed about network to use "Bridged Adapter". After started... (more...)
There continues to be a disproportionate amount of hype around 'NoSQL' data stores. By disproportionate I mean 'completely and utterly out of scale with the actual problems of the vast majority of companies'. I wrote before about 'how NoSQL became more SQL'. The point I made there is now more apparent the more I work with companies on Big Data challenges.
There are three worlds of data