Gluent New World #02: SQL-on-Hadoop with Mark Rittman

It’s time to announce the 2nd episode of the Gluent New World webinar series!

The Gluent New World webinar series is about modern data management: architectural trends in enterprise IT and technical fundamentals behind them.

GNW02: SQL-on-Hadoop : A bit of History, Current State-of-the-Art, and Looking towards the Future

Speaker:

  • This GNW episode is presented by no other than Mark Rittman, the co-founder & CTO of Rittman Mead and an all-around guru of enterprise BI!

(more...)

GoldenGate 12.2 Big Data Adapters: part 3 – Kafka

This post continues my review of GoldenGate Big Data adapters started by review of HDFS and FLUME adapters. Here is list of all posts in the series:

  1. GoldenGate 12.2 Big Data Adapters: part 1 – HDFS
  2. GoldenGate 12.2 Big Data Adapters: part 2 – Flume
  3. GoldenGate 12.2 Big Data Adapters: part 3 – Kafka

In this article I will try the Kafka adapter and see how it works. Firstly, I think it (more...)

Gluent Demo Video Launch

Although we are still in stealth mode (kind-of), due to the overwhelming requests for information, we decided to publish a video about what we do :)

It’s a short 5-minute video, just click on the image below or go straight to http://gluent.com:

Gluent Demo video

And this, by the way, is just the beginning.

Gluent is getting close to 20 people now, distributed teams in US and UK – and we are still hiring!

 

 

 

NB! (more...)

GNW01: In-Memory Processing for Databases

Hi, it took a bit longer than I had planned, but here’s the first Gluent New World webinar recording!

You can also subscribe to our new Vimeo channel here – I will announce the next event with another great speaker soon ;-)

A few comments:

  • Slides are here
  • I’ll figure a good way to deal with offline follow-up Q&A later on, after we’ve done a few of these events

If you like this (more...)

More Animals in Big Data Zoo – Big Data Landscape for 2016

| Mar 25, 2016

Hi All

While surfing net stumbled upon Big Data Landscape for 2016 image and it was very impressive to see many more new Animals in Big Data Zoo.

 

New Animals

Hope This Helps

Sunil S Ranka


SQL is Huuuuuuuge (Making SQL Great Again) Part III

Lightly-edited partial transcript of a panel discussion titled “Making SQL Great Again (SQL is Huuuuuge)” at YesSQL Summit 2016 organized by the Northern California Oracle Users Group (NoCOUG) at Oracle Corporation’s headquarters in Redwood City, California. NoCOUG is the longest-running and most-active Oracle users group in the world. An individual membership only costs $95 and entitles the member to free admission to the four consecutive quarterly NoCOUG conferences (one-day events) that follow the membership’s start (more...)

SQL is Huuuuuuuge (Making SQL Great Again) Part II

Lightly-edited partial transcript of a panel discussion titled “Making SQL Great Again (SQL is Huuuuuge)” at YesSQL Summit 2016 organized by the Northern California Oracle Users Group (NoCOUG) at Oracle Corporation’s headquarters in Redwood City, California. NoCOUG is the longest-running and most-active Oracle users group in the world. An individual membership only costs $95 and entitles the member to free admission to the four consecutive quarterly NoCOUG conferences (one-day events) that follow the membership’s start (more...)

Making SQL Great Again (SQL is Huuuuuge) Part VI

Lightly-edited partial transcript of a panel discussion titled “Making SQL Great Again (SQL is Huuuuuge)” at YesSQL Summit 2016 organized by the Northern California Oracle Users Group (NoCOUG) at Oracle Corporation’s headquarters in Redwood City, California. NoCOUG is the longest-running and most-active Oracle users group in the world. An individual membership only costs $95 and entitles the member to free admission to the four consecutive quarterly NoCOUG conferences (one-day events) that follow the membership’s start (more...)

Making SQL Great Again (SQL is Huuuuuge) Part IV

Lightly-edited partial transcript of a panel discussion titled “Making SQL Great Again (SQL is Huuuuuge)” at YesSQL Summit 2016 organized by the Northern California Oracle Users Group (NoCOUG) at Oracle Corporation’s headquarters in Redwood City, California. NoCOUG is the longest-running and most-active Oracle users group in the world. An individual membership only costs $95 and entitles the member to free admission to the four consecutive quarterly NoCOUG conferences (one-day events) that follow the membership’s start (more...)

Making SQL Great Again (SQL is Huuuuuge) Part V

Lightly-edited partial transcript of a panel discussion titled “Making SQL Great Again (SQL is Huuuuuge)” at YesSQL Summit 2016 organized by the Northern California Oracle Users Group (NoCOUG) at Oracle Corporation’s headquarters in Redwood City, California. NoCOUG is the longest-running and most-active Oracle users group in the world. An individual membership only costs $95 and entitles the member to free admission to the four consecutive quarterly NoCOUG conferences (one-day events) that follow the membership’s start (more...)

Making SQL Great Again (SQL is Huuuuuge) Part VII

Lightly-edited partial transcript of a panel discussion titled “Making SQL Great Again (SQL is Huuuuuge)” at YesSQL Summit 2016 organized by the Northern California Oracle Users Group (NoCOUG) at Oracle Corporation’s headquarters in Redwood City, California. NoCOUG is the longest-running and most-active Oracle users group in the world. An individual membership only costs $95 and entitles the member to free admission to the four consecutive quarterly NoCOUG conferences (one-day events) that follow the membership’s start (more...)

Big Data – Tez, MR, Spark Execution Engine : Performance Comparison

| Feb 25, 2016

There is no question that massive data is being generated in greater volumes than ever before. Along with the traditional data set, new data sources as sensors, application logs, IOT devices, and social networks are adding to data growth. Unlike traditional ETL platforms like Informatica, ODI, DataStage that are largely proprietary commercial products, the majority of Big ETL platforms are powered by open source.

With many execution engines, customers are always curious about their usage (more...)

Spark versus Flink

Spark is an open source Apache project that provides a framework for multi stage in-memory analytics. Spark is based on the Hadoop platform and can interface with Cassandra OpenStack Swift, Amazon S3, Kudu and HDFS. Spark comes with a suite of analytic and machine learning algorithm allowing you to perform a wide variety of analytics on you distribute Hadoop platform. This allows you to generate data insights, data enrichment and data aggregations for storage on (more...)

My BIWA Summit Presentations

Here are the two BIWA Summit 2016 presentations I delivered today. The first one is a collection of high level thoughts (and opinions) of mine and the 2nd one is more technical:

 

NB! If you want to move to the "New World" - and benefit from the awesomeness of Hadoop, without having to re-engineer your existing applications - (more...)

Readings in Database Systems

Mike Stonebraker and Larry Ellison have numerous things in common. If nothing else:

  • They’re both titanic figures in the database industry.
  • They both gave me testimonials on the home page of my business website.
  • They both have been known to use the present tense when the future tense would be more accurate. :)

I mention the latter because there’s a new edition of Readings in Database Systems, aka the Red Book, available online, courtesy of (more...)

Gluent launch! New production release, new HQ, new website!

I’m happy to announce that the last couple of years of hard work is paying off and the Gluent Offload Engine is production now! After beta testing with our early customers, we are now out of complete stealth mode and are ready talk more about what exactly are we doing :-)

Check out our new website and product & use case info here!

Follow us on Twitter:

We are hiring! Need to fill (more...)

Couchbase 4.0 and related subjects

I last wrote about Couchbase in November, 2012, around the time of Couchbase 2.0. One of the many new features I mentioned then was secondary indexing. Ravi Mayuram just checked in to tell me about Couchbase 4.0. One of the important new features he mentioned was what I think he said was Couchbase’s “first version” of secondary indexing. Obviously, I’m confused.

Now that you’re duly warned, let me remind you of aspects of (more...)

Convert CSV file to Apache Parquet… with Drill

Read this article on my new blog A very common use case when working with Hadoop is to store and query simple files (CSV, TSV, ...); then to get better performance and efficient storage convert these files into more efficient format, for example Apache Parquet. Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem. Apache Parquet has the following

Hive (HiveQL) SQL for Hadoop Big Data



In this  post I will share my experience with an Apache Hadoop component called Hive which enables you to do SQL on an Apache Hadoop Big Data cluster.

Being a great fun of SQL and relational databases, this was my opportunity to set up a mechanism where I could transfer some (a lot)  data from a relational database into Hadoop and query it with SQL. Not a very difficult thing to do these days, actually (more...)

Notes on analytic technology, May 13, 2015

1. There are multiple ways in which analytics is inherently modular. For example:

  • Business intelligence tools can reasonably be viewed as application development tools. But the “applications” may be developed one report at a time.
  • The point of a predictive modeling exercise may be to develop a single scoring function that is then integrated into a pre-existing operational application.
  • Conversely, a recommendation-driven website may be developed a few pages — and hence also a few (more...)