Selecting Optimal Parameters for XGBoost Model Training

There is always a bit of luck involved when selecting parameters for Machine Learning model training. Lately, I work with gradient boosted trees and XGBoost in particular. We are using XGBoost in the enterprise to automate repetitive human tasks. While training ML models with XGBoost, I created a pattern to choose parameters, which helps me to build new models quicker. I will share it in this post, hopefully you will find it useful too.

I’m (more...)

Prepare Your Data for Machine Learning Training

The process to prepare data for Machine Learning model training to me looks somewhat similar to the process of preparing food ingredients to cook dinner. You know in both cases it takes time, but then you are rewarded with tasty dinner or a great ML model.

I will not be diving here into data science subject and discussing how to structure and transform data. It all depends on the use case and there are so (more...)

Jupyter Notebook — Forget CSV, fetch data from DB with Python

If you read a book, article or blog about Machine Learning — high chances it will use training data from CSV file. Nothing wrong with CSV, but let’s think if it is really practical. Wouldn’t be better to read data directly from the DB? Often you can’t feed business data directly into ML training, it needs pre-processing — changing categorial data, calculating new data features, etc. Data preparation/transformation step can be done quite easily with (more...)

Python List

This blog post is about appending data elements to list in Python.

Suppose we have a simple list “x”, we will look at different ways to append elements to this list.

x = [1, 2, 3]

The “append” method appends only a single element

>>> x
[1, 2, 3]
>>> x.append(4)
>>> x
[1, 2, 3, 4]
>>>

>> x.append(5, 6, 7)
Traceback (most recent call last):
File "", line 1, in
TypeError:  (more...)

Time SQL Execution with Python

I've said before in this blog how I find Python to be very useful for doing various things, including processing data to or from an Oracle database. Here is another example where a relatively simple and straightforward piece of Python code can deliver something that is very useful - in this case measuring the elapsed time of SQL queries executed on an Oracle database. We often need to execute a given SQL query and see (more...)

How to install Python 3 on Oracle Linux

You can install Python 3 on your Oracle Linux 7 environment with three simple steps: sudo yum install -y yum-utils sudo yum-config-manager --enable *EPEL sudo yum install -y python36 As a first step, in case you don’t have it yet on your system, is to install the yum-utils package. This package includes the yum-config-manager which allows you … Continue reading "How to install Python 3 on Oracle Linux"

Tweet Escalation to Your Support Team — Sentiment Analysis with Machine Learning

I have published an article on Towards Data Science. I explain end-to-end technical solution which would help to streamline your company support process. With the focus on airline support requests received from Twitter. It could save a lot of time and money for the support department if they would know in advance which request is more critical and must be handled with higher priority.

Read the full article here - Solution to automate tweet sentiment (more...)

Python & Oracle 1

While Python is an interpreted language, Python is a very popular programming language. You may ask yourself why it is so popular? The consensus answers to why it’s so popular points to several factors. For example, Python is a robust high-level programming language that lets you:

  • Get complex things done quickly
  • Automate system and data integration tasks
  • Solve complex analytical problems

You find Python developers throughout the enterprise. Development, release engineering, IT operations, and support (more...)

API for Amazon SageMaker ML Sentiment Analysis

Assume you manage support department and want to automate some of the workload which comes from users requesting support through Twitter. Probably you already would be using chatbot to send back replies to users. Bu this is not enough - some of the support requests must be taken with special care and handled by humans. How to understand when tweet message should be escalated and when no? Machine Learning for Business book got an answer. (more...)

How to: Adding Speech to Oracle Digital Assistant; Talk to me Goose

At Oracle Code One in October, and also on DOAG in Nurnberg Germany in November I presented on how to go beyond your regular chatbot. This presentation contained a part on exposing your Oracle Digital Assistant over Alexa and also a part on face recognition. I finally found the time to blog about it. In this blogpost I will share details of the Alexa implementation in this solution.

Machine Learning – Date Feature Transformation Explained

Machine Learning is all about data. The way how you transform and feed data into ML algorithm - greatly depends training success. I will give you an example based on date type data. I will be using scenario described in my previous post - Machine Learning - Getting Data Into Right Shape. This scenario is focused around invoice risk, ML trains to recognize when invoice payment is at risk.

One of the key attributes in (more...)

Python – Flatten List of Lists

Itertools is one of the most powerful module in Python. Today I had requirement to flatten list of lists and itertools made it so easy.

My list —

>> val = [['a','b'],'c',['d','e','f']]

Required Result

['a', 'b', 'c', 'd', 'e', 'f']

How do you do it? Itertools to the resuce —

>>> list(chain.from_iterable(val))
['a', 'b', 'c', 'd', 'e', 'f']

So simple !!

Machine Learning – Getting Data Into Right Shape

When you build machine learning model, first start with the data - make sure input data is prepared well and it represents true state of what you want machine learning model to learn. Data preparation task takes time, but don't hurry - quality data is a key for machine learning success. In this post I will go through essential steps required to bring data into right shape to feed it into machine learning algorithm.

Sample (more...)

Python — Enumerate

Suppose you have a dataset (data) and want to find every 5th item.  How would you do it?

data = [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

The first thing which could come to mind in using slice, but that won’t work as its based on index and index starts from 0.

>>> data[::5]
[1, 11]

The answer should be = [9, 19]

This is where enumerate comes in play. (more...)

Twitter Analytics using Python – Part 3

This is my third (of five) post on using Python to process Twitter data.

Check out my all the posts in the series.

In this post I'll have a quick look at how to save the tweets you have download. By doing this allows you to access them at a later point and to perform more analysis. You have a few instances of saving the tweets. The first of these is to save them to (more...)

Twitter Analytics using Python – Part 2

This is my second (of five) post on using Python to process Twitter data.

Check out my all the posts in the series.

In this post I was going to look at two particular aspects. The first is the converting of Tweets to Pandas. This will allow you to do additional analysis of tweets. The second part of this post looks at how to setup and process streaming of tweets. The first part was longer (more...)

Twitter Analytics using Python – Part 1

(This is probably the first part of, probably, a five part blog series on twitter analytics using Python. Make sure to check out the other posts and I'll post a wrap up blog post that will point to all the posts in the series)

(Yes there are lots of other examples out there, but I've put these notes together as a reminder for myself and a particular project I'm testing)

In this first blog post (more...)

Creating a Word Cloud using Python

Over the past few days I've been doing a bit more playing around with Python, and create a word cloud. Yes there are lots of examples out there that show this, but none of them worked for me. This could be due to those examples using the older version of Python, libraries/packages no long exist, etc. There are lots of possible reasons. So I have to piece it together and the code given below is (more...)

Introduction to Apache Spark with Python

Today, I spoke about “Apache Spark with Python” at Big Talk #2 meet-up in Istanbul Teknokent ARI-3, another event organized by Komtas for big data community. We had almost full room. Mine was the last session of the day but the audience was still very focused and eager to listen the subjects, so for me, the event was great.

By the way, I also enjoyed the sessions of other speakers: Zekeriya Beşioğlu spoke about Data (more...)

Python and Oracle : Fetching records and setting buffer size

If you used other languages, including Oracle PL/SQL, more than likely you will have experienced having to play buffering the number of records that are returned from a cursor. Typically this is needed when you are processing more than a few hundred records. The default buffering size is relatively small and by increasing the size of the number of records to be buffered can dramatically improve the performance of your code.

As with all things (more...)