Examples of using Machine Learning on Video and Photo in Public

Over the past 18 months or so most of the examples of using machine learning have been on looking at images and identifying objects in them. There are the typical examples of examining pictures looking for a Cat or a Dog, or some famous person, etc. Most of these examples are very noddy, although they do illustrate important examples.

But what if this same technology was used to monitor people going about their daily lives. (more...)

Connecting Go Lang to Oracle Database

It seems like more and more people are using Go. With that comes the need to  access a database or databases. This blog will show you how to get connected to an Oracle Database and to perform some basic operations using Go.

The first thing you need is to have Go installed. There are a couple of options for you. The first is go download from the Go Lang website, or if you are (more...)

HiveMall: Transform Categorical features to Numerical

HiveMall is a machine learning library that sits on top of Hive and provides SQL interface to wide range of data preparation and machine learning algorithms.

A common task faced for many machine learning exercises is to convert the data from the format it is captured in (raw data) into a format that is required by the machine learning algorithms. Most ML tools will either have functionality built into the algorithms to do this automatically (more...)

Migrating Python ML Models to other languages

I’ve mentioned in a previous blog post about experiencing some performance issues with using Python ML in production. We needed something quicker and the possible languages we considered were C, C++, Java and Go Lang.

But the data science team used R and Python, with just a few more people using Python than R on the team.

One option was to rewrite everything into the language used in production. As you can imagine no-one wanted (more...)

Machine Learning with Go Lang

Recently I’ve been having a number of conversations with people in several countries about using Go Lang for machine learning. Most of these people have been struggling with using Python for machine learning and are looking for an alternative that will give them better performance. We have been experimenting with C++ and Go Lang to see what the performance differences are. Most of these are with the execution of the ML code. This is great (more...)

Machine Learning Tools and Workbenches

The following is a list of the most commonly used tools and workbenches for machine learning. These are specific to machine learning only. This list does not include any library or frameworks. These are tools and workbenches only. Most offering machine learning tools will include the following features:

  • Easy drag and drop capabilities
  • Data collection
  • Data preparation and cleaning
  • Model building
  • Data Visualization
  • Model Deployment
  • Integration with other tools and languages

As more and more (more...)

Time Series Forecasting in Oracle – Part 2

This is the second part about time-series data modeling using Oracle. Check out the first part here.

In this post I will take a time-series data set and using the in-database time-series functions model the data, that in turn can be used for predicting future values and trends.

The data set used in these examples is the Rossmann Store Sales data set. It is available on Kaggle and was used in one of their competitions.

(more...)

Data Sets for Analytics

When working with analytics, in whatever flavor, one of the key things you need is some data. But data comes in many different shapes and sizes, but where can you get some useful data, be it transactional, time-series, meta-data, analytical, master, categorical, numeric, regression, clustering, etc.

Many of the popular analytics languages have some data sets built into them. For example the R language comes pre-loaded with data sets and these can be accessed using

 (more...)

Time Series Forecasting in Oracle – Part 1

 

Time-series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. In this blog post I’ll introduce what time-series analysis is, the different types of time-series analysis and introduce how you can do this using SQL and PL/SQL in Oracle Database. I’ll have additional blog posts giving more detailed examples of Oracle functions and how they can be used for different time-series data problems.

(more...)

Python transforming Categorical to Numeric

When preparing data for input to machine learning algorithms you may have to perform certain types of data preparation.

In most enterprise solutions all or most of these tasks are automated for you, but in many languages they aren’t. The enterprise solutions are about ‘automating the boring stuff’ so that you don’t have to worry about it and waste valuable time doing boring, repetitive things.

The following examples illustrates a number of ways to record (more...)

Data Normalization in Oracle Data Mining

Normalization is the process of scaling continuous values down to a specific range, often between zero and one. Normalization transforms each numerical value by subtracting a number, called the shift, and dividing the result by another number called the scale. The normalization techniques include:

  • Min-Max Normalization : There is where the normalization is based on the using the minimum value for the shift and the (maximum-minimum) for the scale.
  • Scale Normalization : This is where (more...)

Hivemall: Feature Scaling based on Min-Max values

Once of the most common tasks when preparing data for data mining and machine learning is to take numerical data and scale it. Most enterprise and advanced tools and languages do this automatically for you, but with lower level languages you need to perform the task. There are a number of approaches to doing this. In this example we will use the Min-Max approach.

With the Min-Max feature scaling approach, we need to find the (more...)