In yesterday’s post I looked at Oracle Big Data Discovery and how it brought the search and analytic capabilities of Endeca to Hadoop. We looked at how the Oracle Endeca Information Discovery Studio application works with a version of the Endeca Server engine to analyse and visualise sample sets of data from the Hadoop cluster, and how it uses Apache Spark to retrieve data from Hadoop and then transform that data to make it more (more...)
We got excellent feedback for our first Hadoop User Group Ireland meetup. We wined, dined, and entertained more than 100 Hadoopers (and there was even beer left at the end of the night).
If you want to find out more about Sonra’s Hadoop Data Warehouse Quick Starter Solutions you can contact me or connect with me on LinkedIn.
For those of you who missed the event I have posted some pictures below. We have recorded (more...)
In this blog post we look at how we can address a shortcoming in the Hive ALTER TABLE statement using parameters and variables in the Hive CLI (Hive 0.13 was used).
There’s a simple way to query Hive parameter values directly from CLI
You simply execute (without specifying the value to be set):
You may use those parameters directly in (more...)
Join MapR and Sonra for the Hadoop User Group Ireland Meetup on 23 February at 6 pm at the Wayra offices (O2/Three building). You’ll learn more about the MapR distribution for Apache Hadoop through use cases, case studies and an introduction to the benefits of using the MapR platform.
Come by for this content-packed first event ending with the opportunity to socialise over beer and pizza kindly provided by Sonra.
What is (more...)
If you want to upskill and get certified on Hadoop you can now do so for free. Thanks to MapR. Over the next couple of weeks they are rolling out their on-demand Hadoop training courses. The highlight of the first batch of courses is Developing Hadoop Applications on Yarn.
Data isn't really respected in businesses, you can see that because unlike other corporate assets there is rarely a decent corporate catalog that shows what exists and who has it. In the vast majority of companies there is more effort and automation put into tracking laptops than there is into cataloging and curating information.
Historically we've sort of been able to get away with this
Over six parts I've gone through a bit of a journey on what Big Data Security is all about.
Securing Big Data is about layers
Use the power of Big Data to secure Big Data
How maths and machine learning helps
Why its how you alert that matters
Why Information Security is part of Information Governance
Classifying Risk and the importance of Meta-Data
The fundamental point here is that
Last night I attended an event powered by oGH and OBUG. Mark Rittman was invited to talk about; ‘Hadoop and Oracle Technologies on BI Projects’. This event has been organized to inform us about Hadoop combined with Oracle Technologies. Next to that the event was also meant as a start up of a BI / Warehousing SIG.…Read more Hadoop and Oracle Technologies on BI Projects
So now your Information Governance groups consider Information Security to be important you have to then think about how they should be classifying the risk. Now there are docs out there on some of these which talk about frameworks. British Columbia's government has one for instance that talks about High, Medium and Low risk, but for me that really misses the point and over simplifies the
What does your security team look like today?
Or the IT equivalent, "the folks that say no". The point is that in most companies information security isn't actually something that is considered important. How do I know this? Well because basically most IT Security teams are the equivalent of the nightclub bouncers, they aren't the people who own the club, they aren't as important as the
In the first three parts of this I talked about how Securing Big Data is about layers, and then about how you need to use the power of Big Data to secure Big Data, then how maths and machine learning helps to identify what is reasonable and was is anomalous.
The Target Credit Card hack highlights this problem. Alerts were made, lights did flash. The problem was that so many lights flashed and
In the first two parts of this I talked about how Securing Big Data is about layers, and then about how you need to use the power of Big Data to secure Big Data. The next part is "what do you do with all that data?". This is where Machine Learning and Mathematics comes in, in other words its about how you use Big Data analytics to secure Big Data.
What you want (more...)
In the first part of Securing Big Data I talked about the two different types of security. The traditional IT and ACL security that needs to be done to match traditional solutions with an RDBMS but that is pretty much where those systems stop in terms of security which means they don't address the real threats out there, which are to do with cyber attacks and social engineering. An ACL is only
As Big Data and its technologies such as Hadoop head deeper into the enterprise so questions around compliance and security rear their heads.
The first interesting point in this is that it shows the approach to security that many of the Silicon Valley companies that use Hadoop at scale have taken, namely pretty little really. It isn't that protecting information has been seen as a massively
How has the interest in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards changed over the years?
One easy way to gauge the interest is to measure how much news is generated for the related term and Google Trends allows you do that very easily.
After plugging all of the above terms in Google trends and further analysis leads to the following visualizations.
Aggregating the results by year
It is very amazing to see (more...)
I’m taking a few weeks defocused from work, as a kind of grandpaternity leave. That said, the venue for my Dances of Infant Calming is a small-but-nice apartment in San Francisco, so a certain amount of thinking about tech industries is inevitable. I even found time last Tuesday to meet or speak with my clients at WibiData, MemSQL, Cloudera, Citus Data, and MongoDB. And thus:
1. I’ve been sloppy in my terminology around “geo-distribution”, in (more...)
In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing.
You believe that you have a big data project?
Do not start with the installation of an Hadoop Cluster -- the "how"
Start to talk to business people to understand their problem -- the "why"
Understand the data you must
There have been rumblings from the HPC community indicating a general suspicion of and disdain for Big Data technology which would lead one to believe that whatever Google, Facebook and Twitter do with their supercomputers is not important enough to warrant seriousness—that social supercomputing is simply not worthy. A little of this emotion seems to […]
I’ve talked with many companies recently that believe they are:
- Focused on building a great data management and analytic stack for log management …
- … unlike all the other companies that might be saying the same thing …
- … and certainly unlike expensive, poorly-scalable Splunk …
- … and also unlike less-focused vendors of analytic RDBMS (which are also expensive) and/or Hadoop distributions.
At best, I think such competitive claims are overwrought. Still, it’s a genuinely (more...)
As if anyone needs to be reminded, there’s a ridiculous amount of hype surrounding clouds and big data. There’s always oodles of hype around any new technology that is not well understood—I believe the correct term for this is product marketing. There are at least seven deadly sins that can be committed when determining a […]