Node JS application running on GraalVM – interoperating with Java, Python, R and more

| Nov 4, 2019

When you install GraalVM, one of the things you get is a Node runtime environment (GraalVM 19.2.1 is based on Node 10.16.3 – with support for the core Node libraries and un understanding of NPM modules – and has a JavaScript engine that is ECAMScript 2019 compliant). Instead of V8, the usual JavaScript execution engine, this GraalVM environment leverages GraalJS and the JVM as execution platform. GraalJS runs Java Byte code (more...)

Getting Your Hands Dirty with TensorFlow 2.0 and Keras API

| Oct 31, 2019
Diving into technical details of the regression model creation with TensorFlow 2.0 and Keras API. In TensorFlow 2.0, Keras comes out of the box with TensorFlow library. API is simplified and more convenient to use.

Read the complete article here.

Python application running on GraalVM and Polyglotting with JavaScript, R, Ruby and Java

| Oct 30, 2019

GraalVM is among other things a polyglot language runtime. It can run applications written in many languages – JVM languages like Java, Scala, Groovy and Kotlin as well as non-JVM language such as Python, R, Ruby, JavaScript and LLVM. GraalVM also allows applications in any of these languages to execute code snippets written in any of the other languages it supports. It is like the cab driver that can speak many languages and also is (more...)

Pandas Scratchpad – I

| Oct 6, 2019

This blog is scratchpad for day-to-day Pandas commands.

pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

1. Few quick ways to create Pandas DataFrame

DataFrame from Dict of List –


DataFrame from List of List –


DataFrame from List of Dict –


DataFrame using zip function –


data = {'Name':['Iron Man', 'Deadpool', 'Captian America', Thor', 'Hulk', 'Spider Man'], 'Age':[48, 30, 100, 150, 50,  (more...)

Python – Counter – Compare Lists

| Aug 30, 2019

A few days back I wrote about using sorted(list) to compare 2 list.

Recently I learned we can also use Counter to compare list without taking their order into account.


Happy Learning !!

Merge json files using Pandas

| Aug 25, 2019

Quick demo for merging multiple json files using Pandas –

import pandas as pd
import glob
import json

file_list = glob.glob("*.json")
>>> file_list
['b.json', 'c.json', 'a.json']

Use enumerate to assign counter to files.

allFilesDict = {v:k for v, k in enumerate(file_list, 1)}
>>> allFilesDict
{1: 'b.json', 2: 'c.json', 3: 'a.json'}

Append the data into list –

>>> data = []

for k,v in allFilesDict.items():
    if 1  (more...)

Pandas – ValueError: If using all scalar values, you must pass an index

| Aug 13, 2019

Reading json file using Pandas read_json can fail with “ValueError: If using all scalar values, you must pass an index”. Let see with an example –

cat a.json
  "creator": "CaptainAmerica",
  "last_modifier": "NickFury",
  "title": "Captain America: The First Avenger",
  "view_count": 12000
>>> import pandas as pd
>>> import glob
>>> for f in glob.glob('*.json'):
...     print(f)
>>> pd.read_json('a.json')
Traceback (most recent call last):
  File  (more...)

Python – sort() vs sorted(list)

| Aug 4, 2019

You can compare list using sort() or sorted(list), but be careful with sort() –

>>> c = [('d',4), ('c',3), ('a',1), ('b', 2)]
>>> a = [('a',1), ('b', 2), ('c',3), ('d',4)]
>>> a.sort() == c.sort()
>>> a = [('a',1), ('b', 2), ('c',3), ('d',4)]
>>> b = [('b',2), ('c', 3), ('a',1)]
>>> a.sort() == b.sort()

>>> a = [('a',1), ('b', 2), ('c',3), ('d',4)]
>>> b = [('b',2), ('c', 3),  (more...)

Report Time Execution Prediction with Keras and TensorFlow

| Jul 31, 2019
The aim of this post is to explain Machine Learning to software developers in hands-on terms. Model is based on a common use case in enterprise systems — predicting wait time until the business report is generated.

Report generation in business applications typically takes time, it can be from a few seconds to minutes. Report generation requires time, because typically it would fetch and process many records, this process needs time. Users often get frustrated, (more...)

Forecast Model Tuning with Additional Regressors in Prophet

| Jul 15, 2019
I’m going to share my experiment results with Prophet additional regressors. My goal was to check how extra regressor would weight on forecast calculated by Prophet.

Using dataset from Kaggle — Bike Sharing in Washington D.C. Dataset. Data comes with a number for bike rentals per day and weather conditions. I have created and compared three models:

1. Time series Prophet model with date and number of bike rentals
2. A model with additional (more...)

NumPy in a Nutshell

| Jul 3, 2019

Hello and welcome back. I have started a new category in my blog about Python. The purpose of this post is to go through NumPy library. I will be using Jupyter for the demo but will provide the py file if you prefer to run it in PyCharm for example. NumPy is a core Python Linear Algebra library for Data Science used for faster array processing than the native Python lists with a bunch of (more...)

Serving Prophet Model with Flask — Predicting Future

| Jul 3, 2019
The solution to demonstrate how to serve Prophet model API on the Web with Flask. Prophet — Open-Source Python library developed by Facebook to predict time series data.

An accurate forecast and future prediction are crucial almost for any business. This is an obvious thing and it doesn’t need explanation. There is a concept of time series data, this data is ordered by date and typically each date is assigned with one or more values specific to (more...)

Python – str.maketrans()

| Jun 27, 2019

Working on a Python code, I had a requirement for removing the single/double quotes and open/close brackets from the string of below format —

>>> text = """with summary as (select '
...  'p.col1,p.col2,p.col3, ROW_NUMBER() '
...  'OVER(PARTITION BY p.col1,p.col3 ORDER BY '
...  'p.col2) AS rk from (select * from (select '
...  'col2, col1, col3, '
...  'sum(col4) as col6 from '
...  '"demo"."tab1" a join '
...  "(select lpad(col5, 12, '0') as  (more...)

Cat or Dog — Image Classification with Convolutional Neural Network

| May 5, 2019
The goal of this post is to show how convnet (CNN — Convolutional Neural Network) works. I will be using classical cat/dog classification example described in François Chollet book — Deep Learning with Python. Source code for this example is available on François Chollet GitHub. I’m using this source code to run my experiment.

Convnet works by abstracting image features from the detail to higher level elements. An analogy can be described with the way how humans think. (more...)

Build it Yourself — Chatbot API with Keras/TensorFlow Model

| Apr 24, 2019
Is not that complex to build your own chatbot (or assistant, this word is a new trendy term for chatbot) as you may think. Various chatbot platforms are using classification models to recognize user intent. While obviously, you get a strong heads-up when building a chatbot on top of the existing platform, it never hurts to study the background concepts and try to build it yourself. Why not use a similar model yourself. Chatbot implementation (more...)

Understanding Nested Lists Dictionaries of JSON in Python and AWS CLI

| Apr 20, 2019

After lots of hair pulling, bouts of frustration, I was able to grasp this nested list and dictionary thingie in JSON output of AWS cli commands such as describe-db-instances and others. If you run the describe-db-instances for rds or describe-instances for ec2, you get a huge pile of JSON mumbo-jumpo with all those curly and square brackets studded with colons and commas. The output is heavily nested.

For example, if you do :

aws rds (more...)

named tuple to JSON – Python

| Apr 5, 2019

In pgdb – PostgreSQL DB API, the cursor which is used to manage the context of a fetch operation returns list of named tuples. These named tuples contain field names same as the column names of the database query.

An example of a row from the list of named tuples –

Row(log_time=datetime.datetime(2019, 3, 20, 5, 41, 29, 888000, tzinfo=), user_name='admin', connection_from='', command_tag='INSERT', message='AUDIT: SESSION,1,1,WRITE,INSERT,TABLE,user.demodml,"insert into user.demodml (id) values (1),(2),(3),(4),(5),(6),(7),(8),(9),(11);",',  (more...)

Publishing Machine Learning API with Python Flask

| Apr 1, 2019
Flask is fun and easy to setup, as it says on Flask website. And that's true. This microframework for Python offers a powerful way of annotating Python function with REST endpoint. I’m using Flask to publish ML model API to be accessible by the 3rd party business applications.

This example is based on XGBoost.

For better code maintenance, I would recommend using a separate Jupyter notebook where ML model API will be published. Import Flask (more...)

Selecting Optimal Parameters for XGBoost Model Training

| Mar 13, 2019
There is always a bit of luck involved when selecting parameters for Machine Learning model training. Lately, I work with gradient boosted trees and XGBoost in particular. We are using XGBoost in the enterprise to automate repetitive human tasks. While training ML models with XGBoost, I created a pattern to choose parameters, which helps me to build new models quicker. I will share it in this post, hopefully you will find it useful too.

I’m (more...)

Prepare Your Data for Machine Learning Training

| Mar 6, 2019
The process to prepare data for Machine Learning model training to me looks somewhat similar to the process of preparing food ingredients to cook dinner. You know in both cases it takes time, but then you are rewarded with tasty dinner or a great ML model.

I will not be diving here into data science subject and discussing how to structure and transform data. It all depends on the use case and there are so (more...)