Forecast Model Tuning with Additional Regressors in Prophet

I’m going to share my experiment results with Prophet additional regressors. My goal was to check how extra regressor would weight on forecast calculated by Prophet.

Using dataset from Kaggle — Bike Sharing in Washington D.C. Dataset. Data comes with a number for bike rentals per day and weather conditions. I have created and compared three models:

1. Time series Prophet model with date and number of bike rentals
2. A model with additional (more...)

NumPy in a Nutshell

Hello and welcome back. I have started a new category in my blog about Python. The purpose of this post is to go through NumPy library. I will be using Jupyter for the demo but will provide the py file if you prefer to run it in PyCharm for example. NumPy is a core Python Linear Algebra library for Data Science used for faster array processing than the native Python lists with a bunch of (more...)

Serving Prophet Model with Flask — Predicting Future

The solution to demonstrate how to serve Prophet model API on the Web with Flask. Prophet — Open-Source Python library developed by Facebook to predict time series data.

An accurate forecast and future prediction are crucial almost for any business. This is an obvious thing and it doesn’t need explanation. There is a concept of time series data, this data is ordered by date and typically each date is assigned with one or more values specific to (more...)

Managing imbalanced Data Sets with SMOTE in Python

When working with data sets for machine learning, lots of these data sets and examples we see have approximately the same number of case records for each of the possible predicted values. In this kind of scenario we are trying to perform some kind of classification, where the machine learning model looks to build a model based on the input data set against a target variable. It is this target variable that contains the value (more...)

Python – str.maketrans()

Working on a Python code, I had a requirement for removing the single/double quotes and open/close brackets from the string of below format —

>>> text = """with summary as (select '
...  'p.col1,p.col2,p.col3, ROW_NUMBER() '
...  'OVER(PARTITION BY p.col1,p.col3 ORDER BY '
...  'p.col2) AS rk from (select * from (select '
...  'col2, col1, col3, '
...  'sum(col4) as col6 from '
...  '"demo"."tab1" a join '
...  "(select lpad(col5, 12, '0') as  (more...)

Graceful shutdown of forked workers in Python and JavaScript running in Docker containers

You might encounter a situation where you want to fork a script during execution. For example if the amount of forks is dependent on user input or another specific situation. I encountered such a situation in which I wanted to put load on a service using multiple concurrent processes. In addition, when running in a docker container, only the process with PID=1 receives a SIGTERM signal. If it has terminated, the worker processes receive a (more...)

Migrating Python ML Models to other languages

I’ve mentioned in a previous blog post about experiencing some performance issues with using Python ML in production. We needed something quicker and the possible languages we considered were C, C++, Java and Go Lang.

But the data science team used R and Python, with just a few more people using Python than R on the team.

One option was to rewrite everything into the language used in production. As you can imagine no-one wanted (more...)

Cat or Dog — Image Classification with Convolutional Neural Network

The goal of this post is to show how convnet (CNN — Convolutional Neural Network) works. I will be using classical cat/dog classification example described in François Chollet book — Deep Learning with Python. Source code for this example is available on François Chollet GitHub. I’m using this source code to run my experiment.

Convnet works by abstracting image features from the detail to higher level elements. An analogy can be described with the way how humans think. (more...)

Jupyter Notebook for retrieving JSON data from REST APIs

If data is available from REST APIs, Jupyter Notebooks are a fine vehicle for retrieving that data and storing it in a meaningful, processable format. This article introduces an example of a such a dataset:


Oracle OpenWorld 2018 was a conference that took place in October 2018 in San Francisco. Over 30,000 attendees participated and visited some 2000 sessions. Raw data from the session catalog is available from an API – a REST service that (more...)

The Full Oracle OpenWorld and CodeOne 2018 Conference Session Catalog as JSON data set (for data science purposes)

Oracle OpenWorld and CodeOne 2018 are two co-located conferences that took place in October 2018. Some 2000 sessions presented by over 2500 presenters form the core of these conferences. Many details are known about each of the sessions and the speakers – from title, abstract, room (size), date and time, session slides, session type and key topics to first name, last name, Twitter handle, picture, company, job title and bio. image

A data set is available (more...)

Build it Yourself — Chatbot API with Keras/TensorFlow Model

Is not that complex to build your own chatbot (or assistant, this word is a new trendy term for chatbot) as you may think. Various chatbot platforms are using classification models to recognize user intent. While obviously, you get a strong heads-up when building a chatbot on top of the existing platform, it never hurts to study the background concepts and try to build it yourself. Why not use a similar model yourself. Chatbot implementation (more...)

Understanding Nested Lists Dictionaries of JSON in Python and AWS CLI

After lots of hair pulling, bouts of frustration, I was able to grasp this nested list and dictionary thingie in JSON output of AWS cli commands such as describe-db-instances and others. If you run the describe-db-instances for rds or describe-instances for ec2, you get a huge pile of JSON mumbo-jumpo with all those curly and square brackets studded with colons and commas. The output is heavily nested.

For example, if you do :

aws rds (more...)

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google’s Natural Language API and Python

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

Hi, Game of Thrones aficionados, welcome to GoT Series 8 and my tweet analysis! If you missed any of the prior season episodes, here are I, II and III. Finally, after almost two years, we have a new series and something interesting to write about! If you didn't watch Episode 1, do it before reading this post as it might contain spoilers!

Let's now start with a preview of the starting scene of Episode (more...)

named tuple to JSON – Python

In pgdb – PostgreSQL DB API, the cursor which is used to manage the context of a fetch operation returns list of named tuples. These named tuples contain field names same as the column names of the database query.

An example of a row from the list of named tuples –

Row(log_time=datetime.datetime(2019, 3, 20, 5, 41, 29, 888000, tzinfo=), user_name='admin', connection_from='', command_tag='INSERT', message='AUDIT: SESSION,1,1,WRITE,INSERT,TABLE,user.demodml,"insert into user.demodml (id) values (1),(2),(3),(4),(5),(6),(7),(8),(9),(11);",',  (more...)

Publishing Machine Learning API with Python Flask

Flask is fun and easy to setup, as it says on Flask website. And that's true. This microframework for Python offers a powerful way of annotating Python function with REST endpoint. I’m using Flask to publish ML model API to be accessible by the 3rd party business applications.

This example is based on XGBoost.

For better code maintenance, I would recommend using a separate Jupyter notebook where ML model API will be published. Import Flask (more...)

Selecting Optimal Parameters for XGBoost Model Training

There is always a bit of luck involved when selecting parameters for Machine Learning model training. Lately, I work with gradient boosted trees and XGBoost in particular. We are using XGBoost in the enterprise to automate repetitive human tasks. While training ML models with XGBoost, I created a pattern to choose parameters, which helps me to build new models quicker. I will share it in this post, hopefully you will find it useful too.

I’m (more...)

Prepare Your Data for Machine Learning Training

The process to prepare data for Machine Learning model training to me looks somewhat similar to the process of preparing food ingredients to cook dinner. You know in both cases it takes time, but then you are rewarded with tasty dinner or a great ML model.

I will not be diving here into data science subject and discussing how to structure and transform data. It all depends on the use case and there are so (more...)

Jupyter Notebook — Forget CSV, fetch data from DB with Python

If you read a book, article or blog about Machine Learning — high chances it will use training data from CSV file. Nothing wrong with CSV, but let’s think if it is really practical. Wouldn’t be better to read data directly from the DB? Often you can’t feed business data directly into ML training, it needs pre-processing — changing categorial data, calculating new data features, etc. Data preparation/transformation step can be done quite easily with (more...)

Python List

This blog post is about appending data elements to list in Python.

Suppose we have a simple list “x”, we will look at different ways to append elements to this list.

x = [1, 2, 3]

The “append” method appends only a single element

>>> x
[1, 2, 3]
>>> x.append(4)
>>> x
[1, 2, 3, 4]

>> x.append(5, 6, 7)
Traceback (most recent call last):
File "", line 1, in
TypeError:  (more...)

Time SQL Execution with Python

I've said before in this blog how I find Python to be very useful for doing various things, including processing data to or from an Oracle database. Here is another example where a relatively simple and straightforward piece of Python code can deliver something that is very useful - in this case measuring the elapsed time of SQL queries executed on an Oracle database. We often need to execute a given SQL query and see (more...)