Rittman Mead BI Forum 2015 Call for Papers Now Open!

I’m very pleased to announce that the Call for Papers for the Rittman Mead BI Forum 2015 is now open, with abstract submissions open to January 18th 2015. As in previous years the BI Forum will run over consecutive weeks in Brighton, UK and Atlanta, GA, with the provisional dates and venues as below:

  • Brighton, UK : Hotel Seattle, Brighton, UK : May 6th – 8th 2015
  • Atlanta, GA : Renaissance Atlanta Midtown Hotel, Atlanta, USA (more...)

What makes MapR superior to other Hadoop distributions?

These days Hortonworks with their IPO and Cloudera sitting on $1bn of cash grab all the headlines. However,the real visionary in the field is someone else. Someone blasting the previous world record in TeraSort . A Hadoop distribution on both Amazon Web Services and the Google Compute Engine. A company that Google is invested in. While their competitors have been in skirmishes with each other, MapR has been quietly working away and innovating.

MapR-FS: Features and (more...)

Going Beyond MapReduce for Hadoop ETL Pt.3 : Introducing Apache Spark

In the first two posts in this three part series on going beyond MapReduce for Hadoop ETL, we looked at why MapReduce and Hadoop 1.0 was only really suitable for batch processing, and how the new Apache Tez framework enabled by Apache YARN on the Hadoop 2.0 platform can be swapped-in for MapReduce to improve the performance of existing Pig and Hive scripts. Today though in the final post I want to take (more...)

Going Beyond MapReduce for Hadoop ETL Pt.2 : Introducing Apache YARN and Apache Tez

In the first post in this three part series on going beyond MapReduce for Hadoop ETL, I looked at how a typical Apache Pig script gets compiled into a series of MapReduce jobs, and those MapReduce jobs pass data between themselves by writing intermediate resultsets to disk (HDFS, the Hadoop cluster file system). As a reminder, here’s the Pig script we’re working with:

register /opt/cloudera/parcels/CDH/lib/pig/piggybank.jar
raw_logs = LOAD '/user/mrittman/rm_logs' USING TextLoader AS (line:chararray);
logs_base  (more...)

Going Beyond MapReduce for Hadoop ETL Pt.1 : Why MapReduce Is Only for Batch Processing

Over the previous few months I’ve been looking at the various ways you can load data into Hadoop, process it and then report on it using Oracle tools. We’ve looked at Apache Hive and how it provides a SQL layer over Hadoop, making it possible for tools like ODI and OBIEE to use their usual SQL set-based process approach to access Hadoop data; later on, we looked at another Hadoop tool, Apache Pig, (more...)

Trends in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards

How has the interest in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards changed over the years?

One easy way to gauge the interest is to measure how much news is generated for the related term and Google Trends allows you do that very easily.

After plugging all of the above terms in Google trends and further analysis leads to the following visualizations.

Aggregating the results by year

Image

 

It is very amazing to see (more...)

Big Data… Is Hadoop the good way to start?

In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing. TL;TR You believe that you have a big data project? Do not start with the installation of an Hadoop Cluster -- the "how" Start to talk to business people to understand their problem -- the "why" Understand the data you must

SQL Developer and Big Data Appliance (sort of)

Recently, Enkitec received an Oracle Big Data Appliance (BDA) for our server farm in Dallas (Thanks Accenture!).  With this new addition to the server farm, I’m excited to see what the BDA can do and how to use it.  Since I use Oracle SQL Developer for a lot of things, I figure I better see if I can connect to it…. wait I don’t have access yet, darn!  Simple solution, I’ll just use the (more...)

Data Warehouse Appliance Offerings

Introduction

Information Technology units will continue to be challenged by the unbridled growth of their organization’s data stores. An ever-increasing amount of data needs to be extracted, cleansed, analyzed and presented to the end user community. Data volumes that were unheard of a year ago are now commonplace. Day-to-day operational systems are now storing such large amounts of data that they rival data warehouses in disk storage and administrative complexity. New trends, products, and strategies, (more...)

Top 100 analytics companies ranked and scored by Mattermark

Let us move on from Grass Eating Sauropods and talk about who’s who in the analytic space.

For every dime there are dozen analytic companies. Everybody who provides a freaking dashboard is an analytic company. Anybody that merely mentions Google, Facebook, Hadoop etc in the same sentence is somehow into BigData. Haven’t you stumbled across company pages where they claim to be expert in analytics and big data but they want you to schedule a (more...)

Bed bugs in Boston – Analysis of Boston 311 public dataset

Digging into the Boston public Dataset can reveal interesting and juicy facts.

Even though there is nothing juicy about Bed bugs but the data about Boston open cases for Bed bugs is quite interesting and worth looking at.

We uploaded the entire 50 mb data dump which is around 500K rows into the Data Visualizer and filtered the category for Bed Bugs. Splitting the date into its date hierarchy components we then plotted the month (more...)

Say “Big Data” One More Time (I dare you!)

This is quick. Saw it on Twitter this morning and it is just too funny to not share: Best slide of #Strataconf already? pic.twitter.com/hy8LnEnWhk — Matt Aslett (@maslett) October 16, 2014 Have a great day!Filed under: Big Data, Quotes, User Groups Tagged: #bigdata, quote, Strataconf

Another Great OpenWorld

Steve at the Delphix Booth

Last week I attended Oracle OpenWorld 2014, and it was an outstanding event filled with great people, awesome sessions, and a few outstanding notable experiences.

Personally I thought the messaging behind the conference itself wasn’t as amazing and upbeat as OpenWorld 2013, but that’s almost to be expected. Last year there was a ton of buzz around the introduction of Oracle 12c, Big Data was a buzzword that people were totally excited (more...)

Big Data 2.0 and Agile BI all at Irish BI OUG (24 September).

I will give a presentation on 24 September at the Jury’s Inn in Dublin on the next generation of Big Data 2.0 tools and architecture.

Over the last two years there have been significant changes and improvements in the various Big Data frameworks. With the release of Yarn (Hadoop 2.0) the most popular of these platforms now allows you to run mixed workloads. Gone are the days when Hadoop was only good for (more...)

Responding in Real-Time with Big Data By Mala Ramakrishnan

| Aug 4, 2014

For an organization to respond in real-time it needs to acquire or develop systems
that can respond in real-time. Such systems need to be able to rapidly
determine that a response is required and determine also what the
appropriate and relevant response should be – they need to decide when
and how to act. These kinds of decision-making systems are known as
Decision Management Systems. To ensure that a response is delivered in
real-time, more (more…)

Oracle Data Integrator and Hadoop. Is ODI the only ETL tool for Big Data that works?

Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn’t it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?

In the ODI world this approach is known as ELT. ELT is a marketing concept (more...)

Permissions for both HDFS and local fileSystem paths

| Jul 18, 2014

Hi All,

Permission issues is one of the key error , while setting up Hadoop Cluster, while debugging some error found below table on http://hadoop.apache.org/ . It’s a good scorecard to keep handy.

 

Permissions for both HDFS and local fileSystem paths

The following table lists various paths on HDFS and local filesystems (on all nodes) and recommended permissions:

Filesystem Path User:Group Permissions
local dfs.namenode.name.dir hdfs:hadoop drwx——
local dfs.datanode.data.dir (more...)

War of the Hadoop SQL engines. And the winner is …?

You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we will update our website with our new offerings, products, and services. The article below summarises (more...)

Big Data doom mongers need to look outside of the marketing department

In every change there are hype machines that over play and sages who call doom.  Into the Big Data arena steps David Searls to proclaim that Big Data is a myth and simply hype which is set to burst in an article over at ZDNet. But big data, he said, is nothing more than the myth that collecting vast amounts of data can help companies know customers better than those customers even know

Editor’s Choice award at ODTUG Kscope14: NoSQL and Big Data for the Oracle Professional

My paper on NoSQL and Big Data won the Editor’s Choice award at ODTUG Kscope14. Here are some key points from the paper: The relational camp made serious mistakes that limited the performance and usefulness of the relational model. NoSQL is based on the incorrect premise that tables in the relational model must be mapped to […]