ETL Offload with Spark and Amazon EMR – Part 5 – Summary

This is the final article in a series documenting an exercise that we undertook for a client recently. Currently running an Oracle-based datawarehouse platform, the client asked for our help in understanding what a future ETL and reporting platform could look like, given the current landscape of tools available. You can read the background to the project, how we developed prototype code, deployed it to Amazon, and evaluated tools for analysing the data.

(more...)

ETL Offload with Spark and Amazon EMR – Part 4 – Analysing the Data

We recently did a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce (EMR) Hadoop platform. The proof of concept we ran was on a very simple requirement, taking inbound files from a third party, joining to them to some reference data, and then making the result available for analysis.

The background to the project is here, I showed here how I built up (more...)

ETL Offload with Spark and Amazon EMR – Part 3 – Running pySpark on EMR

In the previous articles (here, and here) I gave the background to a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce (EMR) Hadoop platform. The proof of concept we ran was on a very simple requirement, taking inbound files from a third party, joining to them to some reference data, and then making the result available for analysis.

I showed here (more...)

ETL Offload with Spark and Amazon EMR – Part 2 – Code development with Notebooks and Docker

In the previous article I gave the background to a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce (EMR) Hadoop platform. The proof of concept we ran was on a very simple requirement, taking inbound files from a third party, joining to them to some reference data, and then making the result available for analysis. The primary focus was proving the end-to-end concept, with (more...)

ETL Offload with Spark and Amazon EMR – Part 1

We recently undertook a two-week Proof of Concept exercise for a client, evaluating whether their existing ETL processing could be done faster and more cheaply using Spark. They were also interested in whether something like Redshift would provide a suitable data warehouse platform for them. In this series of blog articles I will look at how we did this, and what we found.

Background

The client has an existing analytics architecture based primarily around Oracle (more...)

Source Control and Automated Code Deployment Options for OBIEE

It's Monday morning. I've arrived at a customer site to help them - ironically enough - with automating their OBIEE code management. But, on arrival, I'm told that the OBIEE team can't meet with me because someone did a release on the previous Friday, had now gone on holiday - and the wrong code was released but they didn't know which version. All hands-on-deck, panic-stations!

This actually happened to me, and in recent months too. (more...)

Enhanced Usage Tracking for OBIEE – Now Available as Open Source!

Introduction

OBIEE provides Usage Tracking as part of the core product functionality. It writes directly to a database table every Logical Query that hits the BI Server, including details of who ran it, when, and information about how it executed including for how long, how many rows, and so on. This in itself is a veritable goldmine of information about your OBIEE system. All OBIEE deployments should have Usage Tracking enabled, for supporting performance analysis, (more...)

The Rittman Mead Open Source Project

We have a strong innovation spirit at Rittman Mead, with all staff encouraged to use technology to its best advantage in order to do things with the software that haven't been done before. Some of these projects may may be 'scratching the itch' of a repeated manual task that should be automated. Others use technology to extend the capabilities of the tools or write new ones to fill gaps that have been identified.

At Rittman (more...)