Apache Drill is an engine that can connect to many different data sources, and provide a SQL interface to them. It's not just a wanna-be SQL interface that trips over at anything complex - it's a hugely functional one including support for many built in functions as well as windowing functions. Whilst it can connect to standard data sources that you'd be able to query with SQL anyway, like Oracle or MySQL, it can also (more...)
In my previous post I looked the latest release of Oracle Stream Analytics (OSA), and saw how it provided a graphical interface to "Fast Data". Users can analyse streaming data as it arrives based on conditions and rules. They can also transform the stream data, publishing it back out as a stream in its own right. In this article we'll see how OSA can be used with Kafka.
Kafka is one of the foremost streaming (more...)
Oracle Stream Analytics (OSA) is a graphical tool that provides “Business Insight into Fast Data”. In layman terms, that translates into an intuitive web-based interface for exploring, analysing, and manipulating streaming data sources in realtime. These sources can include REST, JMS queues, as well as Kafka. The inclusion of Kafka opens OSA up to integration with many new-build data pipelines that use this as a backbone technology.Previously known as Oracle Stream Explorer, it is (more...)
Oracle's Big Data Discovery encompasses a good amount of exploration, transformation, and visualisation capabilities for datasets residing in your organisation’s data reservoir. Even with this though, there may come a time when your data scientists want to unleash their R magic on those same datasets. Perhaps the data domain expert has used BDD to enrich and cleanse the data, and now it's ready for some statistical analysis? Maybe you'd like to use R's excellent forecast (more...)
Big Data Discovery (BDD) is a great tool for exploring, transforming, and visualising data stored in your organisation’s Data Reservoir. I presented a workshop on it at a recent conference, and got an interesting question from the audience that I thought I’d explore further here. Currently the primary route for getting data into BDD requires that it be (i) in HDFS and (ii) have a Hive table defined on top of it. From there, (more...)
I’ve been meaning to write about Apache Spark for quite some time now – I’ve been working with a few of my customers and I find this framework powerful, practical, and useful for a lot of big data usages. For those of you who don’t know about Apache Spark, here is a short introduction.
Apache Spark is a framework for distributed calculation and handling of big data. Like Hadoop, it uses a clustered environment in (more...)
New in Big Data Discovery 1.2 is the addition of BDD Shell, an integration point with Python. This exposes the datasets and BDD functionality in a Python and PySpark environment, opening up huge possibilities for advanced data science work on BDD datasets. With the ability to push back to Hive and thus BDD data modified in this environment, this is important functionality that will make BDD even more useful for navigating and exploring (more...)
New in Big Data Discovery 1.2 is the addition of BDD Shell, an integration point with Python. This exposes the datasets and BDD functionality in a Python and PySpark environment, opening up huge possibilities for advanced data science work on BDD datasets, particularly when used in conjunction with Jupyter Notebooks. With the ability to push back to Hive and thus BDD data modified in this environment, this is important functionality that will make BDD (more...)
It’s time to announce the 3rd episode of Gluent New World webinar series! This time Gwen Shapira will talk about Kafka as a key data infrastructure component of a modern enterprise. And I will ask questions from a old database guy’s viewpoint :)
Apache Kafka and Real Time Stream Processing
Amazon Web Services (AWS) recently released a product called AWS Data Migration Services (DMS) to migrate data between databases.
I have used AWS DMS to try a migration from a source MySQL database to a target MySQL database, a homogeneous database migration.
The DMS service lets you use a resource in the middle Replication Instance - an automatically created EC2 instance - plus source and target Endpoints. Then you move data from the source (more...)
It’s time to announce the 2nd episode of the Gluent New World webinar series!
The Gluent New World webinar series is about modern data management: architectural trends in enterprise IT and technical fundamentals behind them.
GNW02: SQL-on-Hadoop : A bit of History, Current State-of-the-Art, and Looking towards the Future
- This GNW episode is presented by no other than Mark Rittman, the co-founder & CTO of Rittman Mead and an all-around guru of enterprise BI!
Although we are still in stealth mode (kind-of), due to the overwhelming requests for information, we decided to publish a video about what we do :)
It’s a short 5-minute video, just click on the image below or go straight to http://gluent.com:
And this, by the way, is just the beginning.
Gluent is getting close to 20 people now, distributed teams in US and UK – and we are still hiring!