Downsizing the Data Set – Resampling and Binning of Time Series and other Data Sets

Data Sets are often too small. We do not have all data that we need in order to interpret, explain, visualize or use for training a meaningful model. However, quite often our data sets are too large. Or, more specifically, they have higher resolution than is necessary or even than is desirable. We may have a timeseries with values for every other second, although meaningful changes do not happen at lower frequencies than 30 seconds (more...)

Prepare Jupyter Notebook Workshop Environment through Docker container image and Bootstrap Notebook

Earlier this week, I presented a workshop on Data Analytics. I wanted to provide each of the participants with a fully prepared environment, right on everyone’s own laptop (and optionally in a cloud environment such as Katacoda). The environment consisted of Python 3.7, Jupyter Labs (for Notebooks), many additional Python libraries (Pandas, Plotly, Chart Studio, Matrix Profile, SAX, Fuzzy Search and many more) and a number of my own GitHub repositories containing the workshop (more...)

Determine the Language of a Document from the Letter Frequency – using Levenshtein Distance between sequences

imageEven though many languages share the same or a very similar alphabet, the use of letters in documents written in these languages is quite distinct. The letter ” e” is quite popular, but not the most used letter in every language. In fact, the letter frequency is very specific to a language – and can be used to determine the language of a document in a simple and pretty fast way.

The very simple steps (more...)

Tour de France Data Analysis using Strava data in Jupyter Notebook with Python, Pandas and Plotly – Step 2: combining and aligning multi rider data for analyzing and visualizing the Race

In this article, I analyze the race that took place in stage 14 of the 2019 Tour de France in a Jupyter Notebook using Python, Pandas and Plotly and based on the Strava performance data published by Steven Kruijswijk, Thomas de Gendt, Thibaut Pinot and Marco Haller. In this previous article I have explained how we can retrieve the Strava data for a specific rider for a stage in the Tour de France, and in (more...)

Tour de France Data Analysis using Strava data in Jupyter Notebook with Python, Pandas and Plotly – Step 1: single rider loading, exploration, wrangling, visualization

In this article, I will show how to analyze the performance of Steven Kruijswijk during stage 14 of the 2019 Tour de France in a Jupyter Notebook using Python, Pandas and Plotly. Strava collects data from athletes regarding their activities – such as running, cycling, walking and hiking. Members can upload data – and tens of millions do so, including some well known cyclists such as Steven Kruijswijk. In my previous article I have explained (more...)

Analyzing the 2019 Tour de France in depth using Strava performance data from Race Riders

This year’s Tour de France was quite a spectacle. Great performances, exciting stages, unexpected events: it had it all. Analyzing the race events as they unfolded during the stages of this year’s Tour is something I am keen to attempt. Using Jupyter Notebooks, Python and Pandas and Plotly for visualization, I am sure I can get more detailed stories extracted from raw race data. The starting point for such analysis activities is… the data.

However, (more...)

Monitoring Oracle Database using Prometheus

Prometheus is a very popular framework for gathering metrics from a plethora of runtime components, recording, analyzing, visualizing them and making them available for companion technologies such as Grafana. Prometheus harvests information from ‘exporters’ – processes that expose an HTTP endpoint where Prometheus can scrape metrics in a format that it understands.

Recently, I received an email with the following question:

I need help on monitoring Oracle Database using Prometheus. I Googled but not find (more...)

Get going with Project Fn on a remote Kubernetes Cluster from a Windows laptop–using Vagrant, VirtualBox, Docker, Helm and kubectl

| Mar 4, 2018


The challenge I describe in this article is quite specific. I have a Windows laptop. I have access to a remote Kubernetes cluster (on Oracle Cloud Infrastructure). I want to create Fn functions and deploy them to an Fn server running on that Kubernetes (k8s from now on) environment and I want to be able to execute functions running on k8s from my laptop. That’s it.

In this article I will take you on a (more…)