Recently, Enkitec received an Oracle Big Data Appliance (BDA) for our server farm in Dallas (Thanks Accenture!). With this new addition to the server farm, I’m excited to see what the BDA can do and how to use it. Since I use Oracle SQL Developer for a lot of things, I figure I better see if I can connect to it…. wait I don’t have access yet, darn! Simple solution, I’ll just use the (more...)
In this post we will see how Kibana can be used to create visualisations over various sets of data that we have combined together. Kibana is a graphical front end for data held in ElasticSearch, which also provides the analytic capabilities. Previously we looked at where the data came from and exposing it through Hive, and then loading it into ElasticSearch. Here’s what we’ve built so far, the borders denoting what was covered in (more...)
In the first part of this series I described how I made several sets of data relating to the Rittman Mead blog from various sources available through Hive. This included blog hits from the Apache webserver log, tweets, and metadata from WordPress. Having got it into Hive I now need to get it into ElasticSearch as a pre-requisite for using Kibana to see how it holds up as a analysis tool or as a (more...)
I’ve recently started learning more about the tools and technologies that fall under the loose umbrella term of Big Data, following a lot of the blogs that Mark Rittman has written, including getting Apache log data into Hadoop, and bringing Twitter data into Hadoop via Mongodb.
What I wanted to do was visualise the data I’d brought in, looking for patterns and correlations. Obviously the de facto choice at our shop would (more...)
Information Technology units will continue to be challenged by the unbridled growth of their organization’s data stores. An ever-increasing amount of data needs to be extracted, cleansed, analyzed and presented to the end user community. Data volumes that were unheard of a year ago are now commonplace. Day-to-day operational systems are now storing such large amounts of data that they rival data warehouses in disk storage and administrative complexity. New trends, products, and strategies, (more...)
Let us move on from Grass Eating Sauropods and talk about who’s who in the analytic space.
For every dime there are dozen analytic companies. Everybody who provides a freaking dashboard is an analytic company. Anybody that merely mentions Google, Facebook, Hadoop etc in the same sentence is somehow into BigData. Haven’t you stumbled across company pages where they claim to be expert in analytics and big data but they want you to schedule a (more...)
Digging into the Boston public Dataset can reveal interesting and juicy facts.
Even though there is nothing juicy about Bed bugs but the data about Boston open cases for Bed bugs is quite interesting and worth looking at.
We uploaded the entire 50 mb data dump which is around 500K rows into the Data Visualizer and filtered the category for Bed Bugs. Splitting the date into its date hierarchy components we then plotted the month (more...)
Last week I attended Oracle OpenWorld 2014, and it was an outstanding event filled with great people, awesome sessions, and a few outstanding notable experiences.
Personally I thought the messaging behind the conference itself wasn’t as amazing and upbeat as OpenWorld 2013, but that’s almost to be expected. Last year there was a ton of buzz around the introduction of Oracle 12c, Big Data was a buzzword that people were totally excited (more...)
Virtually everyone in data space today claims that they are a Big Data vendor and that their products are Big Data products. Of course — if you are not in Big Data then you are legacy. So how do you know whether a product is a Big Data product?
While there might not be fully objective criteria (and mainly because Big Data definition is still in the air and people interpret it as they see (more...)
I will give a presentation on 24 September at the Jury’s Inn in Dublin on the next generation of Big Data 2.0 tools and architecture.
Over the last two years there have been significant changes and improvements in the various Big Data frameworks. With the release of Yarn (Hadoop 2.0) the most popular of these platforms now allows you to run mixed workloads. Gone are the days when Hadoop was only good for (more...)
For an organization to respond in real-time it needs to acquire or develop systems
that can respond in real-time. Such systems need to be able to rapidly
determine that a response is required and determine also what the
appropriate and relevant response should be – they need to decide when
and how to act. These kinds of decision-making systems are known as
Decision Management Systems. To ensure that a response is delivered in
real-time, more (more…)
Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn’t it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?
In the ODI world this approach is known as ELT. ELT is a marketing concept (more...)
Permission issues is one of the key error , while setting up Hadoop Cluster, while debugging some error found below table on http://hadoop.apache.org/ . It’s a good scorecard to keep handy.
The following table lists various paths on HDFS and local filesystems (on all nodes) and recommended permissions:
You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we will update our website with our new offerings, products, and services. The article below summarises (more...)
A PDF version of the article can be downloaded here http://www.proligence.com/art/oracle_scene_summ14_hadoop.pdf