In the first three parts of this I talked about how Securing Big Data is about layers, and then about how you need to use the power of Big Data to secure Big Data, then how maths and machine learning helps to identify what is reasonable and was is anomalous.
The Target Credit Card hack highlights this problem. Alerts were made, lights did flash. The problem was that so many lights flashed and
In the first two parts of this I talked about how Securing Big Data is about layers, and then about how you need to use the power of Big Data to secure Big Data. The next part is "what do you do with all that data?". This is where Machine Learning and Mathematics comes in, in other words its about how you use Big Data analytics to secure Big Data.
What you want (more...)
In the first part of Securing Big Data I talked about the two different types of security. The traditional IT and ACL security that needs to be done to match traditional solutions with an RDBMS but that is pretty much where those systems stop in terms of security which means they don't address the real threats out there, which are to do with cyber attacks and social engineering. An ACL is only
As Big Data and its technologies such as Hadoop head deeper into the enterprise so questions around compliance and security rear their heads.
The first interesting point in this is that it shows the approach to security that many of the Silicon Valley companies that use Hadoop at scale have taken, namely pretty little really. It isn't that protecting information has been seen as a massively
I’m taking a few weeks defocused from work, as a kind of grandpaternity leave. That said, the venue for my Dances of Infant Calming is a small-but-nice apartment in San Francisco, so a certain amount of thinking about tech industries is inevitable. I even found time last Tuesday to meet or speak with my clients at WibiData, MemSQL, Cloudera, Citus Data, and MongoDB. And thus:
1. I’ve been sloppy in my terminology around “geo-distribution”, in (more...)
In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing.
You believe that you have a big data project?
Do not start with the installation of an Hadoop Cluster -- the "how"
Start to talk to business people to understand their problem -- the "why"
Understand the data you must
There have been rumblings from the HPC community indicating a general suspicion of and disdain for Big Data technology which would lead one to believe that whatever Google, Facebook and Twitter do with their supercomputers is not important enough to warrant seriousness—that social supercomputing is simply not worthy. A little of this emotion seems to […]
I’ve talked with many companies recently that believe they are:
- Focused on building a great data management and analytic stack for log management …
- … unlike all the other companies that might be saying the same thing …
- … and certainly unlike expensive, poorly-scalable Splunk …
- … and also unlike less-focused vendors of analytic RDBMS (which are also expensive) and/or Hadoop distributions.
At best, I think such competitive claims are overwrought. Still, it’s a genuinely (more...)
As if anyone needs to be reminded, there’s a ridiculous amount of hype surrounding clouds and big data. There’s always oodles of hype around any new technology that is not well understood—I believe the correct term for this is product marketing. There are at least seven deadly sins that can be committed when determining a […]
Big data continues to live up to its reputation for disruption as it gnaws away at all of the entrenched constituencies – IT silos, vendors, pricing models and now, careers; it’s about to get very personal. On the possibility side, there is a massive skills gap that needs to be filled. Everybody knows there aren't […]
Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn’t it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?
In the ODI world this approach is known as ELT. ELT is a marketing concept (more...)
My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete (more...)
Oracle is announcing today what it’s calling “Oracle Big Data SQL”. As usual, I haven’t been briefed, but highlights seem to include:
- Oracle Big Data SQL is basically data federation using the External Tables capability of the Oracle DBMS.
- Unlike independent products — e.g. Cirro — Oracle Big Data SQL federates SQL queries only across Oracle offerings, such as the Oracle DBMS, the Oracle NoSQL offering, or Oracle’s Cloudera-based Hadoop appliance.
- Also unlike independent (more...)
The idea of clouds “meeting" big data or big data "living in" clouds isn’t simply marketing hype. Because big data followed so closely on the trend of cloud computing, both customers and vendors still struggle to understand the differences from their enterprise-centric perspectives. Everyone assumes that Hadoop can work in conventional clouds as easily as […]
Data lakes, like legacy storage arrays, are passive. They only hold data. Hadoop is an active reservoir, not a passive data lake. HDFS is a computational file system that can digest, filter and analyze any data in the reservoir, not just store it. HDFS is the only economically sustainable, computational file system in existence. Some […]
I was recently invited to speak about big data at the Rocky Mountain Oracle User's Group. I presentedto Oracle professionals who are faced with an onslaught of hype and mythology regarding Big Data in generaland Hadoop in particular. Most of the audience was familiar with the difficulty of attempting to engineer even Modest Data on […]
In the 1960s, Bank of America and IBM built one of the first credit card processing systems. Although those early mainframes processed just a fraction of the data compared to that of eBay or Amazon, the engineering was complex for the day. Once credit cards became popular, processing systems had to be built to handle […]
Scoop, Flume, PIG, Zookeeper. Do these mean anything to you? If they do then the odds are you are looking at Hadoop. The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually (more...)
Hi Fellow Big Data Admirers ,
With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools.
My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in (more...)
Over the last few years there has been a trend of increased spending on BI, and that trend isn't going away. The analyst predictions however have, understandably, been based on the mentality that the choice was between a traditional EDW/DW model or Hadoop. With the new 'Business Data Lake' type of hybrid approach its pretty clear that the shift is underway for all vendors to have a hybrid