I’m very pleased to announce that the Call for Papers for the Rittman Mead BI Forum 2015 is now open, with abstract submissions open to January 18th 2015. As in previous years the BI Forum will run over consecutive weeks in Brighton, UK and Atlanta, GA, with the provisional dates and venues as below:
- Brighton, UK : Hotel Seattle, Brighton, UK : May 6th – 8th 2015
- Atlanta, GA : Renaissance Atlanta Midtown Hotel, Atlanta, USA (more...)
These days Hortonworks with their IPO and Cloudera sitting on $1bn of cash grab all the headlines. However,the real visionary in the field is someone else. Someone blasting the previous world record in TeraSort . A Hadoop distribution on both Amazon Web Services and the Google Compute Engine. A company that Google is invested in. While their competitors have been in skirmishes with each other, MapR has been quietly working away and innovating.
MapR-FS: Features and (more...)
In the first two posts in this three part series on going beyond MapReduce for Hadoop ETL, we looked at why MapReduce and Hadoop 1.0 was only really suitable for batch processing, and how the new Apache Tez framework enabled by Apache YARN on the Hadoop 2.0 platform can be swapped-in for MapReduce to improve the performance of existing Pig and Hive scripts. Today though in the final post I want to take (more...)
In the first post in this three part series on going beyond MapReduce for Hadoop ETL, I looked at how a typical Apache Pig script gets compiled into a series of MapReduce jobs, and those MapReduce jobs pass data between themselves by writing intermediate resultsets to disk (HDFS, the Hadoop cluster file system). As a reminder, here’s the Pig script we’re working with:
raw_logs = LOAD '/user/mrittman/rm_logs' USING TextLoader AS (line:chararray);
Over the previous few months I’ve been looking at the various ways you can load data into Hadoop, process it and then report on it using Oracle tools. We’ve looked at Apache Hive and how it provides a SQL layer over Hadoop, making it possible for tools like ODI and OBIEE to use their usual SQL set-based process approach to access Hadoop data; later on, we looked at another Hadoop tool, Apache Pig, (more...)
How has the interest in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards changed over the years?
One easy way to gauge the interest is to measure how much news is generated for the related term and Google Trends allows you do that very easily.
After plugging all of the above terms in Google trends and further analysis leads to the following visualizations.
Aggregating the results by year
It is very amazing to see (more...)
In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing.
You believe that you have a big data project?
Do not start with the installation of an Hadoop Cluster -- the "how"
Start to talk to business people to understand their problem -- the "why"
Understand the data you must
Recently, Enkitec received an Oracle Big Data Appliance (BDA) for our server farm in Dallas (Thanks Accenture!). With this new addition to the server farm, I’m excited to see what the BDA can do and how to use it. Since I use Oracle SQL Developer for a lot of things, I figure I better see if I can connect to it…. wait I don’t have access yet, darn! Simple solution, I’ll just use the (more...)
Information Technology units will continue to be challenged by the unbridled growth of their organization’s data stores. An ever-increasing amount of data needs to be extracted, cleansed, analyzed and presented to the end user community. Data volumes that were unheard of a year ago are now commonplace. Day-to-day operational systems are now storing such large amounts of data that they rival data warehouses in disk storage and administrative complexity. New trends, products, and strategies, (more...)
Let us move on from Grass Eating Sauropods and talk about who’s who in the analytic space.
For every dime there are dozen analytic companies. Everybody who provides a freaking dashboard is an analytic company. Anybody that merely mentions Google, Facebook, Hadoop etc in the same sentence is somehow into BigData. Haven’t you stumbled across company pages where they claim to be expert in analytics and big data but they want you to schedule a (more...)
Digging into the Boston public Dataset can reveal interesting and juicy facts.
Even though there is nothing juicy about Bed bugs but the data about Boston open cases for Bed bugs is quite interesting and worth looking at.
We uploaded the entire 50 mb data dump which is around 500K rows into the Data Visualizer and filtered the category for Bed Bugs. Splitting the date into its date hierarchy components we then plotted the month (more...)
This is quick. Saw it on Twitter this morning and it is just too funny to not share: Best slide of #Strataconf already? pic.twitter.com/hy8LnEnWhk — Matt Aslett (@maslett) October 16, 2014 Have a great day!Filed under: Big Data, Quotes, User Groups Tagged: #bigdata, quote, Strataconf
Last week I attended Oracle OpenWorld 2014, and it was an outstanding event filled with great people, awesome sessions, and a few outstanding notable experiences.
Personally I thought the messaging behind the conference itself wasn’t as amazing and upbeat as OpenWorld 2013, but that’s almost to be expected. Last year there was a ton of buzz around the introduction of Oracle 12c, Big Data was a buzzword that people were totally excited (more...)
I will give a presentation on 24 September at the Jury’s Inn in Dublin on the next generation of Big Data 2.0 tools and architecture.
Over the last two years there have been significant changes and improvements in the various Big Data frameworks. With the release of Yarn (Hadoop 2.0) the most popular of these platforms now allows you to run mixed workloads. Gone are the days when Hadoop was only good for (more...)
For an organization to respond in real-time it needs to acquire or develop systems
that can respond in real-time. Such systems need to be able to rapidly
determine that a response is required and determine also what the
appropriate and relevant response should be – they need to decide when
and how to act. These kinds of decision-making systems are known as
Decision Management Systems. To ensure that a response is delivered in
real-time, more (more…)
Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn’t it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?
In the ODI world this approach is known as ELT. ELT is a marketing concept (more...)
Permission issues is one of the key error , while setting up Hadoop Cluster, while debugging some error found below table on http://hadoop.apache.org/ . It’s a good scorecard to keep handy.
Permissions for both HDFS and local fileSystem paths
The following table lists various paths on HDFS and local filesystems (on all nodes) and recommended permissions:
You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we will update our website with our new offerings, products, and services. The article below summarises (more...)
In every change there are hype machines that over play and sages who call doom. Into the Big Data arena steps David Searls to proclaim that Big Data is a myth and simply hype which is set to burst in an article over at ZDNet.
But big data, he said, is nothing more than the myth that collecting vast amounts of data can help companies know customers better than those customers even know
My paper on NoSQL and Big Data won the Editor’s Choice award at ODTUG Kscope14. Here are some key points from the paper: The relational camp made serious mistakes that limited the performance and usefulness of the relational model. NoSQL is based on the incorrect premise that tables in the relational model must be mapped to […]