Twitter Analytics using Python – Part 3

This is my third (of five) post on using Python to process Twitter data.

Check out my all the posts in the series.

In this post I'll have a quick look at how to save the tweets you have download. By doing this allows you to access them at a later point and to perform more analysis. You have a few instances of saving the tweets. The first of these is to save them to (more...)

Twitter Analytics using Python – Part 2

This is my second (of five) post on using Python to process Twitter data.

Check out my all the posts in the series.

In this post I was going to look at two particular aspects. The first is the converting of Tweets to Pandas. This will allow you to do additional analysis of tweets. The second part of this post looks at how to setup and process streaming of tweets. The first part was longer (more...)

Twitter Analytics using Python – Part 1

(This is probably the first part of, probably, a five part blog series on twitter analytics using Python. Make sure to check out the other posts and I'll post a wrap up blog post that will point to all the posts in the series)

(Yes there are lots of other examples out there, but I've put these notes together as a reminder for myself and a particular project I'm testing)

In this first blog post (more...)

Creating a Word Cloud using Python

Over the past few days I've been doing a bit more playing around with Python, and create a word cloud. Yes there are lots of examples out there that show this, but none of them worked for me. This could be due to those examples using the older version of Python, libraries/packages no long exist, etc. There are lots of possible reasons. So I have to piece it together and the code given below is (more...)

Introduction to Apache Spark with Python

Today, I spoke about “Apache Spark with Python” at Big Talk #2 meet-up in Istanbul Teknokent ARI-3, another event organized by Komtas for big data community. We had almost full room. Mine was the last session of the day but the audience was still very focused and eager to listen the subjects, so for me, the event was great.

By the way, I also enjoyed the sessions of other speakers: Zekeriya Beşioğlu spoke about Data (more...)

Python and Oracle : Fetching records and setting buffer size

If you used other languages, including Oracle PL/SQL, more than likely you will have experienced having to play buffering the number of records that are returned from a cursor. Typically this is needed when you are processing more than a few hundred records. The default buffering size is relatively small and by increasing the size of the number of records to be buffered can dramatically improve the performance of your code.

As with all things (more...)

Using Spark to Process Data From Cassandra for Analytics

After my presentation about Apache Cassandra, most people asked if they can run analytical queries on Cassandra, and how they can integrate Spark with Cassandra. So I decided to write a blog post to demonstrate how we can process data from Cassandra using Spark. In this blog post, I’ll show how I can build a testing environment on Oracle Cloud (Spark + Cassandra), load sample data to Cassandra, and query the data using Spark.

Let (more...)

Oracle and Python setup with cx_Oracle

Is Python the new R?

Maybe, maybe not, but that I'm finding in recent months is more companies are asking me to use Python instead of R for some of my work.

In this blog post I will walk through the steps of setting up the Oracle driver for Python, called cx_Oracle. The documentation for this drive is good and detailed with plenty of examples available on GitHub. Hopefully there isn't anything new in this (more...)

Starting and Stopping Oracle Reports Servers with WLST

Normally people are using to start and stop their Oracle Reports Servers the by Oracle provided scripts and in the $DOMAIN_HOME/bin.

The problem with this set of scripts is, that they really take long time to complete and you need to execute it for each Reports Server:
# Lets measure the time
time ./ rep_server1
Starting system Component rep_server1 ...

Initializing WebLogic Scripting Tool (WLST) ...

Welcome to WebLogic Server (more...)

Building Classrooms in the Cloud

Jumpbox Lab Server

Let’s face it: education without interaction is about as effective as shouting origami instructions at a lumberjack who is cutting down trees. Sure, your informative lessons will come in handy when the product of their work finally becomes paper, but it will be long forgotten and ultimately worthless by then. The only way a student is going to learn is if they can put (more...)

A Neural Network Scoring Engine in PL/SQL

Topic: In this post, you will find an example of how to build and deploy a basic artificial neural network scoring engine using PL/SQL for recognizing handwritten digits. This post is intended for learning purposes, in particular for Oracle practitioners who want a hands-on introduction to neural networks.


Machine learning and neural networks in particular, are currently hot topics in data processing. Many tools and platform are now easily available to work and experiment  (more...)

“What do you mean there’s line breaks in the address?” said SQLLDR

I had a large-ish CSV to load and a problem: line breaks inside some of the delimited fields.

Like these two records:

one, two, "three beans", four
five, six, "seven
beans", "eight wonderful beans"

SQL Loader simply won’t handle this, as plenty of sad forum posts attest. The file needs pre-processing and here is a little python script to do it, adapted from Jmoreland91’s solution on Stack Overflow.

import sys, csv, os

Rename multiple exported Files after using SQL Developer’s Cart to export from Oracle database

If you’re searching for “export Oracle BLOB”, the article, by Jeff Smith, titled “Exporting Multiple BLOBs with Oracle SQL Developer” using Oracle SQL Developer” is usually at the top of the search result. The SQL Developer features the Shopping Cart without using scripts to export BLOBs out of database. I don’t want to go into detail as Jeff already explained well in his post what it is and how to use it. One (more...)

Multisessioning with Python

I'll admit that I pretty constantly have at least one window either open into SQL*Plus or at the command line ready to run a deployment script through it. But there's time when it is worth taking a step beyond.

One problem with the architecture of most SQL clients is they connect to a database, send off a SQL statement and do nothing until the database responds back with an answer. That's a great model when (more...)

Wanted: RDBMS superpower summary for app developers

At last night's WWCode Cincinnati panel, I recommended that developers talk to their DBA about what advanced capabilities their RDBMS can offer, so that they don't end up reimplementing functionality in the app that are already available (better and more efficiently) in the database itself. Devs can waste a lot of effort by thinking of databases as dumb, inert data boxes.

I was asked an excellent question: "Where can a dev quickly familiarize herself with (more...)

IoT Hackathon Part IV : Using Web Services to send Sensordata

In the previous 3 posts, building towards the eProseed IoT Hackathon, I described how to setup your Raspberry Pi, and how to use the GrovePi sensors. The used example is a small weather-station that read temperature and humidity and shows the readings on a display. That is all very nice, however, the data remains local on the Raspberry Pi so there is nothing that we can do with this information

Code Studio rocks; diversity does, too

If you want to quickly get some kids introduced to computer programming concepts, you could do a lot worse than using Code Studiofrom That's what I did the last couple weeks - took two hours to lightly shepherd the Dayton YWCA day camp through a programming intro.

It's really well-organized and easy to understand - frankly, it pretty much drives itself. It's based on block-dragging for turtle graphics and/or simple 2D games, (more...)

Generating Diceware Passwords in Python

Today I’m going back to a theme from a post last year and looking at generating passwords with my favourite programming language. A tweet from Simon Brunning pointed me to Micah Lee’s article at The Intercept and my first thought was to write a function to do this in Python. So here it is;

def generate_diceware_password(word_count=6):
    import random
    word_dict = {}
    passphrase = []
    with open('diceware.wordlist.andy.txt') as f:
        for line in f. (more...)


I've never had a tool I really liked that would extract a chunk of a large production database for testing purposes while respecting the database's foreign keys. This past week I finally got to write one: rdbms-subsetter.

rdbms-subsetter postgresql://user:passwd@host/source_db postgresql://user:passwd@host/excerpted_db 0.001

Getting it to respect referential integrity "upward" - guaranteeing every needed parent record would be included for each child row - took less than a day. Trying to get it to also guarantee (more...)

%sql: To Pandas and Back

A Pandas DataFrame has a nice to_sql(table_name, sqlalchemy_engine) method that saves itself to a database.

The only trouble is that coming up with the SQLAlchemy Engine object is a little bit of a pain, and if you're using the IPython %sql magic, your %sql session already has an SQLAlchemy engine anyway. So I created a bogus PERSIST pseudo-SQL command that simply calls to_sql with the open database connection:

%sql PERSIST mydataframe

The result is (more...)