Understanding Data Gravity as a DBA

Data gravity and the friction it causes within the development cycle is an incredibly obvious problem in my eyes.

Data gravity suffers from the Von Newmann Bottleneck. It’s a basic limitation on how fast computers can be. Pretty simple, but states that the speed of where data resides and where it’s processed is the limiting factor in computing speed.

OLAP, DSS and VLDB DBAs are constantly in battle (more...)

Hadoop for Database Professionals – St. Louis (7. Sep)

Here’s some more free stuff by Gluent!

We are running another half-day course together with Cloudera, this time in St. Louis on 7. September 2017.

We will use our database background and explain using database professionals terminology why “new world” technologies like Hadoop will take over some parts of the enterprise IT, why are those platforms so much better for advanced analytics over big datasets and how to use the right tool from Hadoop ecosystem (more...)

Apache Impala Internals Deep Dive with Tanel Poder + Gluent New World Training Month

We are running a “Gluent New World training month” in this July and have scheduled 3 webinars on following Wednesdays for this!

The first webinar with Michael Rainey is going to cover modern alternatives to the traditional old-school “ETL on a RDBMS” approach for data integration and sharing. Then on the next Wednesday I will demonstrate some Apache Impala SQL engine’s internals, with commentary from an Oracle database geek’s angle (I plan to get pretty (more...)

The Snowflake Data Sharehouse. Wow!

With Snowflake Data Sharing, you can now easily transform your data into a valuable, strategic business asset.

Snowflake at Stoweflake

Every year the World Wide Data Vault Consortium (WWDVC) gets better and better! This year’s event was the 4th Annual and was again held at the lovely Stoweflake Mountain Lodge in Stowe, Vermont. And once again this year, my employer, Snowflake Computing, was a proud sponsor of the event. This year I even got to […]

Introduction to Oracle Big Data Cloud Service (Part III) – Ambari

This is the third blog post about Oracle Big Data Cloud Service. I continue to guide you about the Big Data Cloud Service and its components. In this blog post, I will introduce Ambari – the management service of our hadoop cluster.

The Apache Ambari simplifies provisioning, managing, and monitoring Apache Hadoop clusters. It’s the default management tool of Hortonworks Data Platform but it can be used independently from Hortonworks. After you create your big (more...)

Introduction to Oracle Big Data Cloud Service – Compute Edition (Part II)

In my previous post, I gave a list of installed services on a Oracle Big Data Cloud Service when you select “full” as deployment profile. In this post, I’ll explain these services and software.

HDFS: HDFS is a distributed, scalable, and portable file system written in Java for Hadoop. It stores data so it is the main component of the our cluster. A Hadoop (big data) cluster has nominally a single namenode plus a cluster (more...)

Introduction to Oracle Big Data Cloud Service – Compute Edition (Part I)

Over the last few years, Oracle has dedicated to cloud computing and they are in a very tough race with its competitors. In order to stand out in this race, Oracle provides more services day by day. One of the services Oracle offers to the end user is “Oracle Big Data Cloud Service”. I examined this service by creating a trial account, and I decided to write a series of blog posts for those who (more...)

New Snowflake features released in Q1’17

This post provides an overview of the major new Snowflake features we released during Q1 of this year, and highlights the main value they provide.

Riga Dev Days 2017, new experiences in many ways.

Riga Dev Days 2017

General

It has been a while since my last blog-post.
One of the reasons is my shift from closed to open source software, databases more specifically. More on that in a later blog-post.

The reason for already mentioning this is this strange hybrid (what a popular word, these days) situation that I am in at the moment.
Thanks to the super enthusiastic, flexible and tenacious organization-team of the Riga Dev Days, (more...)

Snowflake and Spark, Part 2: Pushing Spark Query Processing to Snowflake

This post provides the details of Snowflake’s ability to push query processing down from Spark into Snowflake.

Cloud Analytics Conference – London!

Join Snowflake and The Data Warrior in London on June 1st for a Cloud Analytics Conference

Installing Hortonworks Data Platform 2.5 on Microsoft Azure

I presented this topic to the Big Data Meetup in Nottingham on Thursday but sometimes people prefer a blog to a presentation, so I’ve fashioned this article from the slides…

This article assumes the following:

Snowflake and Spark, Part 1: Why Spark? 

Snowflake Computing is making great strides in the evolution of our Elastic DWaaS in the cloud. Here is a recent update from engineering and product management on our integration with Spark: This is the first post in an ongoing series describing Snowflake’s integration with Spark. In this post, we introduce the Snowflake Connector for Spark (package […]

Machine Learning Algorithm Cheat Sheet

| Apr 10, 2017

With so many algorithms around its always a struggle to find out which algorithm could be suitable for the problem statement, I want to solve. Microsoft has done an amazing job to start with. Please find attached  Machine Learning Algorithm Cheat Sheet .

Screen Shot 2017-04-10 at 7.23.12 PM

Hope This Helps

Sunil S Ranka


I’m speaking at Advanced Spark Meetup & attending Deep Learning Workshop in San Francisco

In case you are interested in the “New World” and happen to be in Bay Area this week (19 & 21 Jan 2017), there are two interesting events that you might want to attend (I’ll speak at one and attend the other):

Advanced Spark and TensorFlow Meetup

I’m speaking at the advanced Apache Spark meetup and showing different ways for profiling applications with the main focus on CPU efficiency. This is a free Meetup in San Francisco hosted (more...)

GNW05 – Extending Databases With the Full Power of Hadoop: How Gluent Does It

It’s time to announce the next webinar in the Gluent New World series. This time I will deliver it myself (and let’s have some fun :-)

Details below:

GNW05 – Extending Databases With the Full Power of Hadoop: How Gluent Does It

NB! If you want to move to the "New World" - offload your data and workloads to Hadoop, without having to re-write your existing applications - check out Gluent. We are making history! (more...)

Gluent Podcast with Mark Rittman

Mark Rittman has been publishing his podcast series (Drill to Detail) for a while now and I sat down with him at UKOUG Tech 2016 conference to discuss Gluent and its place in the new world with him.

This podcast episode is about 49 minutes and it explains the reasons why I decided to go on to build Gluent a couple of years ago and where I see the enterprise data world going in (more...)

Gluent New World #03: Real Time Stream Processing in Modern Enterprises with Gwen Shapira

It’s time to announce the 3rd episode of Gluent New World webinar series! This time Gwen Shapira will talk about Kafka as a key data infrastructure component of a modern enterprise. And I will ask questions from a old database guy’s viewpoint :)

Apache Kafka and Real Time Stream Processing

Speaker:

  • Gwen Shapira (Confluent)
  • Gwen is a system architect at Confluent helping customers achieve
    success with their Apache Kafka implementation. She has 15 years of
    (more...)

Database Migration and Integration using AWS DMS



Amazon Web Services (AWS) recently released a product called AWS Data Migration Services (DMS) to migrate data between databases.

The experiment

I have used AWS DMS to try a migration from a source MySQL database to a target MySQL database, a homogeneous database migration.

The DMS service lets you use a resource in the middle Replication Instance - an automatically created EC2 instance - plus source and target Endpoints. Then you move data from the source (more...)