Some impressions from Oracle Analytics Cloud–taken from keynote at Oracle OpenWorld 2017

In his keynote on October 3rd during Oracle OpenWorld 2017, Thomas Kurian stated that the vision at Oracle around analytics has changed quite considerably. He explained this change and the new vision using this slide.

image

All kinds of data, all kinds of users, many more ways to present and visualize and machine generated insights to complement human understanding.

The newly launched Analytics Cloud supports this vision.

image 

Zooming in on Data Preparation:

image

And from cleansed (more...)

Hadoop for Database Professionals – St. Louis (7. Sep)

Here’s some more free stuff by Gluent!

We are running another half-day course together with Cloudera, this time in St. Louis on 7. September 2017.

We will use our database background and explain using database professionals terminology why “new world” technologies like Hadoop will take over some parts of the enterprise IT, why are those platforms so much better for advanced analytics over big datasets and how to use the right tool from Hadoop ecosystem (more...)

Oracle BDCSCE Upgraded: Zeppelin 0.7 and Spark 2.1

Last week, Oracle Big Data Cloud Service – Compute Edition was upgraded from 17.2.5 to 17.3.1-20. I do not know if the new version is still in testing phase and available to only trial users, but sooner or later the new version will be available to all Oracle Cloud users.

The new version is still based on HDP 2.4.2 but it contains upgrades on two key components: Zeppelin and (more...)

Introduction to Oracle Big Data Cloud Service – Compute Edition (Part VI) – Hive

I though I would stop writing about “Oracle Big Data Cloud Service – Compute Edition” after my fifth blog post, but then I noticed that I didn’t mention about the Apache Hive, another important component of the Big Data. Hive is a data warehouse infrastructure built on top of Hadoop, designed to work with large datasets. Why is it so important? Because it includes support for SQL (SQL:2003 and SQL:2011), and helps users to utilize (more...)

Apache Impala Internals Deep Dive with Tanel Poder + Gluent New World Training Month

We are running a “Gluent New World training month” in this July and have scheduled 3 webinars on following Wednesdays for this!

The first webinar with Michael Rainey is going to cover modern alternatives to the traditional old-school “ETL on a RDBMS” approach for data integration and sharing. Then on the next Wednesday I will demonstrate some Apache Impala SQL engine’s internals, with commentary from an Oracle database geek’s angle (I plan to get pretty (more...)

Introduction to Oracle Big Data Cloud Service – Compute Edition (Part V) – Pig

This is my last blog post of my introduction series for Oracle Big Data Cloud Service – Compute Edition. In this blog post, I’ll mention “Apache Pig”. It’s a tool/platform created by “Yahoo!” to analyze large data sets without the complexities of writing a traditional MapReduce program. It’s designed to process any kind of data (structured or unstructured) so it’s a great tool for ETL jobs. Pig comes installed and ready to use with (more...)

Introduction to Oracle Big Data Cloud Service (Part IV) – Zeppelin

This is my forth blog post about Oracle Big Data Cloud Service. In my previous blog posts, I showed how we can create a big data cloud service on Oracle Cloud, which services are installed by default, ambari management service and now it’s time to write about how we can work with data using Apache Zeppelin. Apache Zeppelin is a web-based notebook that enables interactive data analytics. Zeppelin is not the only way to work (more...)

Introduction to Oracle Big Data Cloud Service (Part III) – Ambari

This is the third blog post about Oracle Big Data Cloud Service. I continue to guide you about the Big Data Cloud Service and its components. In this blog post, I will introduce Ambari – the management service of our hadoop cluster.

The Apache Ambari simplifies provisioning, managing, and monitoring Apache Hadoop clusters. It’s the default management tool of Hortonworks Data Platform but it can be used independently from Hortonworks. After you create your big (more...)

Introduction to Oracle Big Data Cloud Service – Compute Edition (Part I)

Over the last few years, Oracle has dedicated to cloud computing and they are in a very tough race with its competitors. In order to stand out in this race, Oracle provides more services day by day. One of the services Oracle offers to the end user is “Oracle Big Data Cloud Service”. I examined this service by creating a trial account, and I decided to write a series of blog posts for those who (more...)

Installing Hortonworks Data Platform 2.5 on Microsoft Azure

I presented this topic to the Big Data Meetup in Nottingham on Thursday but sometimes people prefer a blog to a presentation, so I’ve fashioned this article from the slides…

This article assumes the following:

GNW05 – Extending Databases With the Full Power of Hadoop: How Gluent Does It

It’s time to announce the next webinar in the Gluent New World series. This time I will deliver it myself (and let’s have some fun :-)

Details below:

GNW05 – Extending Databases With the Full Power of Hadoop: How Gluent Does It

NB! If you want to move to the "New World" - offload your data and workloads to Hadoop, without having to re-write your existing applications - check out Gluent. We are making history! (more...)

Are analytic RDBMS and data warehouse appliances obsolete?

I used to spend most of my time — blogging and consulting alike — on data warehouse appliances and analytic DBMS. Now I’m barely involved with them. The most obvious reason is that there have been drastic changes in industry structure:

Gluent New World #03: Real Time Stream Processing in Modern Enterprises with Gwen Shapira

It’s time to announce the 3rd episode of Gluent New World webinar series! This time Gwen Shapira will talk about Kafka as a key data infrastructure component of a modern enterprise. And I will ask questions from a old database guy’s viewpoint :)

Apache Kafka and Real Time Stream Processing

Speaker:

  • Gwen Shapira (Confluent)
  • Gwen is a system architect at Confluent helping customers achieve
    success with their Apache Kafka implementation. She has 15 years of
    (more...)

Gluent New World #02: SQL-on-Hadoop with Mark Rittman

It’s time to announce the 2nd episode of the Gluent New World webinar series!

The Gluent New World webinar series is about modern data management: architectural trends in enterprise IT and technical fundamentals behind them.

GNW02: SQL-on-Hadoop : A bit of History, Current State-of-the-Art, and Looking towards the Future

Speaker:

  • This GNW episode is presented by no other than Mark Rittman, the co-founder & CTO of Rittman Mead and an all-around guru of enterprise BI!

(more...)

Gluent Demo Video Launch

Although we are still in stealth mode (kind-of), due to the overwhelming requests for information, we decided to publish a video about what we do :)

It’s a short 5-minute video, just click on the image below or go straight to http://gluent.com:

Gluent Demo video

And this, by the way, is just the beginning.

Gluent is getting close to 20 people now, distributed teams in US and UK – and we are still hiring!

 

 

 

NB! (more...)

More Animals in Big Data Zoo – Big Data Landscape for 2016

| Mar 25, 2016

Hi All

While surfing net stumbled upon Big Data Landscape for 2016 image and it was very impressive to see many more new Animals in Big Data Zoo.

 

New Animals

Hope This Helps

Sunil S Ranka


Big Data – Tez, MR, Spark Execution Engine : Performance Comparison

| Feb 25, 2016

There is no question that massive data is being generated in greater volumes than ever before. Along with the traditional data set, new data sources as sensors, application logs, IOT devices, and social networks are adding to data growth. Unlike traditional ETL platforms like Informatica, ODI, DataStage that are largely proprietary commercial products, the majority of Big ETL platforms are powered by open source.

With many execution engines, customers are always curious about their usage (more...)

Readings in Database Systems

Mike Stonebraker and Larry Ellison have numerous things in common. If nothing else:

  • They’re both titanic figures in the database industry.
  • They both gave me testimonials on the home page of my business website.
  • They both have been known to use the present tense when the future tense would be more accurate. :)

I mention the latter because there’s a new edition of Readings in Database Systems, aka the Red Book, available online, courtesy of (more...)

Couchbase 4.0 and related subjects

I last wrote about Couchbase in November, 2012, around the time of Couchbase 2.0. One of the many new features I mentioned then was secondary indexing. Ravi Mayuram just checked in to tell me about Couchbase 4.0. One of the important new features he mentioned was what I think he said was Couchbase’s “first version” of secondary indexing. Obviously, I’m confused.

Now that you’re duly warned, let me remind you of aspects of (more...)

Convert CSV file to Apache Parquet… with Drill

Read this article on my new blog A very common use case when working with Hadoop is to store and query simple files (CSV, TSV, ...); then to get better performance and efficient storage convert these files into more efficient format, for example Apache Parquet. Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem. Apache Parquet has the following