Using Spark to Process Data From Cassandra for Analytics

After my presentation about Apache Cassandra, most people asked if they can run analytical queries on Cassandra, and how they can integrate Spark with Cassandra. So I decided to write a blog post to demonstrate how we can process data from Cassandra using Spark. In this blog post, I’ll show how I can build a testing environment on Oracle Cloud (Spark + Cassandra), load sample data to Cassandra, and query the data using Spark.

Let (more...)

Ingesting data into Hive using Spark

Before heading off for my hols at the end of this week to soak up some sun with my gorgeous princess, I thought I would write another blog post. Two posts in a row – that has not happened for ages 🙂

 

My previous post was about loading a text file into Hadoop using Hive. Today what I am looking to do is to load the same file into a Hive table but using Spark (more...)

Loading data in Hadoop with Hive

It’s been a busy month but 2018 begins with a new post.

 

A common Big Data scenario is to use Hadoop for transforming data and data ingestion – in other words using Hadoop for ETL.

In this post, I will show an example of how to load a comma separated values text file into HDFS. Once the file is moved in HDFS, use Apache Hive to create a table and load the data into (more...)

Using Spark to join data from CSV and MySQL Table

Yesterday, I explained how we can access MySQL database from Zeppelin which comes with Oracle Big Data Cloud Service Compute Edition (BDCSCE). Although we can use Zeppelin to access MySQL, we still need something more powerful to combine data from two different sources (for example data from CSV file and RDBMS tables). Spark is a great choice to process data. In this blog post, I’ll write a simple PySpark (Python for Spark) code which will (more...)

Using Zeppelin to Access MySQL

If you want to access MySQL Cloud Service using Zeppelin of Oracle Big Data Cloud Service Compute Edition (BDCSCE), you can use Spark DataFrames or Zeppelin interpreters. In this blog post, I’ll show how we can edit JDBC interpreter to connect MySQL Cloud Service.

First login to Oracle Big Data Cloud console, and go to “settings” tab, and open “notebook” section. You’ll see the interpreter settings. Search for “jdbc” and click “edit” button to edit (more...)

Content- My Year in Review, 2017

So where did 2017 go?!?!?  I really, really would like to know…  Needless to say, it’s time to do a year in review already and I actually have time to do it this year!

DBAKevlar Blog

I wrote over 100 blog posts this year between DBAKevlar, Delphix and partner sites, but I’ve enjoyed sharing with the different communities.  There’s significant changes going on in the IT world regarding the future of the (more...)

DBMS_COMPRESSION can be run in parallel, at least from 11.2.0.4

I was trying to use DBMS_COMPRESSION on an 11gR2 (11.2.0.4 on RHEL 6.3) database the other day and was helped by this article from Neil Johnson. I was having some trouble with permissions until I read that article which nicely summarises everything you need to know – thanks Neil!

One thing I did notice is that Neil stated that you can’t parallelise the calls to the advisor since it uses the (more...)

Okta SSO with Snowflake 

Ever wonder how to secure a cloud data warehouse?

Big Data Marathon

This week there is a Big Data event in London, gathering Big Data clients, geeks and vendors from all over to speak on the latest trends, projects, platforms and products which helps everyone to stay on the same page and align the steering wheel as well as get a feeling of where the fast-pacing technology world is going. The event is massive but I am glad I could make it even only for one hour (more...)

Join the Cloud Analytics Academy

Maybe not a cool as Star Fleet Academy, but this is pretty cool. Snowflake and a number of our partners have come together to create the first, self-paced, vendor agnostic, online training academy for analytics in the cloud. This academy will get you up to speed on what is happening today in the cloud with […]

Hadoop for Database Professionals class at NoCOUG Fall Conference on 9th Nov

If you happen to be in Bay Area on Thursday 9th November, then come check out the NoCOUG Fall Conference in California State University in downtown Oakland, CA.

Gluent is delivering a Hadoop for Database Professionals class as a separate track there (with myself and Michael Rainey as speakers) where we’ll explain the basics & concepts of modern distributed data processing platforms and then show a bunch of Hadoop demos too (mostly SQL-on-Hadoop stuff (more...)

Hadoop for Database Professionals – St. Louis (7. Sep)

Here’s some more free stuff by Gluent!

We are running another half-day course together with Cloudera, this time in St. Louis on 7. September 2017.

We will use our database background and explain using database professionals terminology why “new world” technologies like Hadoop will take over some parts of the enterprise IT, why are those platforms so much better for advanced analytics over big datasets and how to use the right tool from Hadoop ecosystem (more...)

Apache Impala Internals Deep Dive with Tanel Poder + Gluent New World Training Month

We are running a “Gluent New World training month” in this July and have scheduled 3 webinars on following Wednesdays for this!

The first webinar with Michael Rainey is going to cover modern alternatives to the traditional old-school “ETL on a RDBMS” approach for data integration and sharing. Then on the next Wednesday I will demonstrate some Apache Impala SQL engine’s internals, with commentary from an Oracle database geek’s angle (I plan to get pretty (more...)

Installing Hortonworks Data Platform 2.5 on Microsoft Azure

I presented this topic to the Big Data Meetup in Nottingham on Thursday but sometimes people prefer a blog to a presentation, so I’ve fashioned this article from the slides…

This article assumes the following:

Machine Learning Algorithm Cheat Sheet

| Apr 10, 2017

With so many algorithms around its always a struggle to find out which algorithm could be suitable for the problem statement, I want to solve. Microsoft has done an amazing job to start with. Please find attached  Machine Learning Algorithm Cheat Sheet .

Screen Shot 2017-04-10 at 7.23.12 PM

Hope This Helps

Sunil S Ranka


I’m speaking at Advanced Spark Meetup & attending Deep Learning Workshop in San Francisco

In case you are interested in the “New World” and happen to be in Bay Area this week (19 & 21 Jan 2017), there are two interesting events that you might want to attend (I’ll speak at one and attend the other):

Advanced Spark and TensorFlow Meetup

I’m speaking at the advanced Apache Spark meetup and showing different ways for profiling applications with the main focus on CPU efficiency. This is a free Meetup in San Francisco hosted (more...)

GNW05 – Extending Databases With the Full Power of Hadoop: How Gluent Does It

It’s time to announce the next webinar in the Gluent New World series. This time I will deliver it myself (and let’s have some fun :-)

Details below:

GNW05 – Extending Databases With the Full Power of Hadoop: How Gluent Does It

NB! If you want to move to the "New World" - offload your data and workloads to Hadoop, without having to re-write your existing applications - check out Gluent. We are making history! (more...)

Gluent Podcast with Mark Rittman

Mark Rittman has been publishing his podcast series (Drill to Detail) for a while now and I sat down with him at UKOUG Tech 2016 conference to discuss Gluent and its place in the new world with him.

This podcast episode is about 49 minutes and it explains the reasons why I decided to go on to build Gluent a couple of years ago and where I see the enterprise data world going in (more...)

Database Migration and Integration using AWS DMS



Amazon Web Services (AWS) recently released a product called AWS Data Migration Services (DMS) to migrate data between databases.

The experiment

I have used AWS DMS to try a migration from a source MySQL database to a target MySQL database, a homogeneous database migration.

The DMS service lets you use a resource in the middle Replication Instance - an automatically created EC2 instance - plus source and target Endpoints. Then you move data from the source (more...)

Gear up for #AIOUG OTN Yathra’ 2016

Guys, AIOUG is back again with OTN Yathra’ 2016. It is a series of technology evangelist events organized by All India Oracle Users Group in six cities touring across the length and breadth of the country. It was my extreme pleasure to be the part of it in 2015 and I’m pleased to announce that … Continue reading