MySQL on Fedora 27

While updating my class image to Fedora 27, I noticed that it installed the Akonadi Server. The documentation on the Akonadi server lacked some straightforward documentation. It also offered a bundled set of software that limited how to approach MySQL development.

So, I removed all those packages with the following syntax:

dnf remove `rpm -qa | grep akonadi`

After removing those Akonadi packages, I installed the MySQL Community Edition from the Fedora repo with this (more...)

Beware (Sort-Of) Ambiguous Column Names In Sub-Selects

This morning I received an UPDATE statement from a developer that I was testing. It ran without errors but then I saw that it updated 5 rows when it should have only updated 3. The reason gave me a little shock so I whipped up a simple test-case to reproduce the problem.

First we create two tables:

    id int
    , name varchar(30)
CREATE TABLE (more...)

Global variable wsrep_OSU_method=RSU ignored on MariaDB Galera Cluster 10.0.32

Galera Cluster is great solution for application clustering but it has some limitations. One of the most critical limitations is how Galera Cluster performs model changes.

From Galera Cluster documentation:
What’s happening is a side effect of a multi-master cluster with several appliers. 
The cluster needs to control when a DDL statement ends in relation to other transactions,
in order to deterministically detect conflicts and schedule parallel appliers.
Effectively, the DDL statement (more...)

Using Spark to join data from CSV and MySQL Table

Yesterday, I explained how we can access MySQL database from Zeppelin which comes with Oracle Big Data Cloud Service Compute Edition (BDCSCE). Although we can use Zeppelin to access MySQL, we still need something more powerful to combine data from two different sources (for example data from CSV file and RDBMS tables). Spark is a great choice to process data. In this blog post, I’ll write a simple PySpark (Python for Spark) code which will (more...)

Optimizer Strategien für Subqueries für Oracle, Postgres und MySQL

Chris Antognini hat in seinem Blog eine sehr spannende Untersuchung zur Frage durchgeführt: wie gut kommen die Optimizer unterschiedlicher RDBMS mit relativ einfachen Subqueries klar? In der umfangreichen Untersuchung betrachtet er sechs unterschiedliche Subquery-Typen:
  • Scalar subqueries with equality predicate
  • Scalar subqueries with inequality predicate
  • Uncorrelated subqueries with either IN or EXISTS
  • Uncorrelated subqueries with either NOT IN or NOT EXISTS
  • Correlated subqueries with either IN or EXISTS
  • Correlated subqueries with either NOT IN or NOT (more...)

How Well a Query Optimizer Handles Subqueries?

At the beginning of December, at the UKOUG Tech17 conference in Birmingham (GB), I presented a comparison of the query optimizers of MySQL 8.0.3 and PostgreSQL 10.1. One of the things I talked about is their ability to handle subqueries. I summarized my findings with the following sentence:

Simple sub-queries that are not correctly optimized were observed.

It goes without saying that such a sentence leaves a lot of questions open. After (more...)

Oracle Week 2017: Docker Concepts for Oracle/MySQL DBAs and DevOps (slides)

Oracle Week in Israel is a reason for celebration. Every year (and for the past 24 years), between a 1500 and 2000 Oracle professional are gathered for a 5 days conference on Oracle related educational topics. Each day is a full-day seminar on a specific topic – 9am to 4:30pm talking about something related to Oracle technologies. This year, I had 4 full-day back-to-back seminars – which is way more than the average speaker. Some (more...)

Beware of intensive slow query logging when using – log_queries_not_using_indexes

MySQL slow query log is great for identifying slow queries that are good candidates for optimisation. Slow query logging is disabled by default, but it is activated by DBA's or developers on most environments.

You can use slow query log to record all the traffic but be careful with this action as logging all traffic could be very I/O intensive and could have negative impact on general performance. It is recommended to record all traffic (more...)

Delete large amounts of data on Galera Cluster using pt-archiver

Galera Cluster is excellent virtually synchronous multi-master database cluster. It has many benefits which you can check on GaleraCluster.
But beside benefits it has some limitations and one of them is handling large transactions.

Large replication data sets could degrade performance of whole cluster causing cluster freezing, increased memory consumption, crashing nodes, etc. To avoid this issues it is recommended to split large transactions into smaller chunks.

In this post I want to show you (more...)

Wanted: RDBMS superpower summary for app developers

At last night's WWCode Cincinnati panel, I recommended that developers talk to their DBA about what advanced capabilities their RDBMS can offer, so that they don't end up reimplementing functionality in the app that are already available (better and more efficiently) in the database itself. Devs can waste a lot of effort by thinking of databases as dumb, inert data boxes.

I was asked an excellent question: "Where can a dev quickly familiarize herself with (more...)

Couchbase 4.0 and related subjects

I last wrote about Couchbase in November, 2012, around the time of Couchbase 2.0. One of the many new features I mentioned then was secondary indexing. Ravi Mayuram just checked in to tell me about Couchbase 4.0. One of the important new features he mentioned was what I think he said was Couchbase’s “first version” of secondary indexing. Obviously, I’m confused.

Now that you’re duly warned, let me remind you of aspects of (more...)

Confusion and problems with lost+found directory in MySQL/Galera cluster configuration

The lost+found directory is filesystem directory created at root level of mounted drive for ext file systems. It is used by file system check tools (fsck) for file recoveries.

In MySql world it can cause confusion or possible problems with synchronisation in Galera cluster configuration.

Let’s check some examples.

I have MySQL database with datadir=/data in configuration file. I have deleted lost+found directory and restarted MySQL service.

When I list my databases this is result:


Test MySQL on AWS quickly

Using sysbench to performance test AWS RDS MySQL hardware is an easy three step  operation. 

Sysbench creates synthetic tests and they are done on a 1 mil row 'sbtest' table that sysbench creates in the MySQL database you indicate. The test doesn't intrude with your database schema, and it doesn't use your data, so it is quite safe. The test is an OLTP test trying to simulate event operations in the database as it (more...)

MariaDB – Measure Replicaton Lag and Check / Fix Replication Inconsistencies using Percona tools

Percona Toolkit is collection of command-line tools to perform many MySQL tasks like creating backups, finding duplicate indexes, managing replication, etc.

In this post I will talk about how to measure replication lag and check/fix replication inconsistencies with this tools:

I am using environment from previous blog post.
Master-Master replication with MariaDB 10.0.16 database on Debian 7.

Install Percona Toolkit on both nodes:

$ sudo wget

$ (more...)

MariaDB(MySQL) Master-Master Replication

The simplest and probably most common replication method is master-slave replication. Basically, data is replicated from master database to the slave. In case of master database failure you must get the slave database up-to-date before failover and then promote slave to be new master.

Another method is to set up replication in both directions called master-master replication. But you must be aware that this setup brings some potential issues as data changes are happening on (more...)

MySQL Locking

Have you ever wondered if MySQL does Row Level Locking? Or wondered why are you getting the error:

ERROR 1205 (HY000) : Lock wait timeout exceeded; try restarting transaction

You get the error because your allocated time to hold a DML lock in a transaction exceeds the set limit. Usually the default limit to hold a DML row lock, set by innodb_lock_wait_timeout db parameter, is 50 seconds. If your transaction doesn't commit/rollback within 50 seconds (more...)

Thoughts and notes, Thanksgiving weekend 2014

I’m taking a few weeks defocused from work, as a kind of grandpaternity leave. That said, the venue for my Dances of Infant Calming is a small-but-nice apartment in San Francisco, so a certain amount of thinking about tech industries is inevitable. I even found time last Tuesday to meet or speak with my clients at WibiData, MemSQL, Cloudera, Citus Data, and MongoDB. And thus:

1. I’ve been sloppy in my terminology around “geo-distribution”, in (more...)


I've never had a tool I really liked that would extract a chunk of a large production database for testing purposes while respecting the database's foreign keys. This past week I finally got to write one: rdbms-subsetter.

rdbms-subsetter postgresql://user:passwd@host/source_db postgresql://user:passwd@host/excerpted_db 0.001

Getting it to respect referential integrity "upward" - guaranteeing every needed parent record would be included for each child row - took less than a day. Trying to get it to also guarantee (more...)

%sql: To Pandas and Back

A Pandas DataFrame has a nice to_sql(table_name, sqlalchemy_engine) method that saves itself to a database.

The only trouble is that coming up with the SQLAlchemy Engine object is a little bit of a pain, and if you're using the IPython %sql magic, your %sql session already has an SQLAlchemy engine anyway. So I created a bogus PERSIST pseudo-SQL command that simply calls to_sql with the open database connection:

%sql PERSIST mydataframe

The result is (more...)

auto-generate SQLAlchemy models

PyOhio gave my lightning talk on ddlgenerator a warm reception, and Brandon Lorenz got me thinking, and PyOhio sprints filled my with py-drenaline, and now ddlgenerator can inspect your data and spit out SQLAlchemy model definitions for you:

$ cat merovingians.yaml
name: Clovis I
from: 486
to: 511
name: Childebert I
from: 511
to: 558
$ ddlgenerator --inserts sqlalchemy merovingians.yaml

from sqlalchemy import create_engine, Column, Integer, Table, Unicode
engine (more...)