Athena – SQL to get date of next Monday

Uncategorized
| Nov 18, 2019

I was recently asked how to get date of next Monday irrespective of which day of the week sql is executed. So thought to share it, in-case someone else has such requirement.

select date_add('day', 8 - extract(day_of_week from current_date), current_date)

 
Or,

select date_trunc('week', current_date) + interval '7' day;

Happy learning 🙂

 

How to Use AWS S3 bucket for Spark History Server

Uncategorized
| Nov 18, 2019

Since EMR Version 5.25, it’s possible to debug and monitor your Apache Spark jobs by logging directly into the off-cluster, persistent, Apache Spark History Server using the EMR Console. You do not need to anything extra to enable it, and you can access the Spark history even after the cluster is terminated. The logs are available for active clusters and are retained for 30 days after the cluster is terminated.

Although this is a (more...)

Getting started with #Exasol on #AWS

Uncategorized
| Nov 13, 2019

It’s amazingly easy to run an Exasol Cluster on Amazon Web Services (AWS).

Subscribe Exasol in AWS marketplace

After having registered and having logged in to your AWS account, go to the AWS marketplace and search for Exasol:

Click on the Exasol Single Node and Cluster BYOL link and then on Continue to Subscribe:

After having reviewed the T&C, click on Accept Terms. It shows this message afterwards:

Create Key Pair

Now login (more...)

State management in serverless functions – connection pooling in AWS Lambda leveraging memoized functions

Uncategorized
| Oct 31, 2019

The problem

Setting up a datastore connection is an expensive process. The amount of available connections is often limited and creating a connection costs precious time – and and in serverless functons time literally means money. Especially in something like an event processing Lambda, where thousands of events can call your function within a couple of seconds you would rather not set up a connection for each individual invocation. Thankfully, you don’t have to! By (more...)

Lambda Function to Resize EBS Volumes of EMR Nodes

Uncategorized
| Oct 30, 2019

I have to start by saying that you should not use EMR as a persistent Hadoop cluster. The power of EMR lies in its elasticity. You should launch an EMR cluster, process the data, write the data to S3 buckets, and terminate the cluster. However, we see lot of AWS customers use the EMR as a persistent cluster. So I was not surprised when a customer told that they need to resize EBS volume automatically (more...)

Query a HBASE table through Hive using PySpark on EMR

Uncategorized
| Oct 15, 2019

In this blog post, I’ll demonstrate how we can access a HBASE table through Hive from a PySpark script/job on an AWS EMR cluster. First I created an EMR cluster (EMR 5.27.0, Hive 2.3.5, Hbase 1.4.0). Then I connected to the master node, executed “hbase shell”, created a HBASE table, and inserted a sample row:

create 'mytable','f1'
put 'mytable', 'row1', 'f1:name', 'Gokhan'

I logged in to hive and created (more...)

Amazon QLDB and the Missing Command Line Client

Uncategorized
| Sep 10, 2019

Amazon Quantum Ledger Database is is a fully managed ledger database which tracks all changes of user data and maintains a verifiable history of changes over time. It was announced at AWS re:Invent 2018 and now available in five AWS regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo).

You may ask why you would like to use QLDB (a ledger database) instead of using your traditional (more...)

Sample AWS Lambda Function to Monitor Oracle Database

Uncategorized
| Sep 4, 2019

I wrote a very simple AWS Lambda function to demonstrate how to connect an Oracle database, gather the tablespace usage information, and send these metrics to CloudWatch. First, I wrote this lambda function in Python and then I had to re-write it in Java. As you may know, you need to use cx_oracle module to connect Oracle Databases with Python. This extension module requires some libraries which are shipped by Oracle Database Client (oh God! (more...)

Costs in Multi-Cloud

Uncategorized
| Aug 28, 2019

Over the last couple of years we have seen growing references to multi-cloud. That is to say, people are recognizing that organisations, particularly larger ones are ending up with cloud services for many different vendors. This at-least in part has come from where departments within an organization can by meaningful resources within their local budgets.

Whilst there is a competitive benefit of the recent partnership agreement between Microsoft and Oracle given the market margin AWS (more...)

DevOps for Oracle DBA

Uncategorized
| Aug 24, 2019
DevOps is natural evolution for Oracle database administrators or sysadmins of any kind. The key to remain relevant in the industry is to embrace DevOps these days and in near future.

The good news is that if you are an Oracle DBA, you already have the solid foundation. You have worked with the enterprise, world class database system and are aware of high availability, disaster recovery, performance optimization, and troubleshooting. Having said that, there is (more...)

S3 – fatal error: An error occurred (404) when calling the HeadObject operation

Uncategorized
| Aug 5, 2019

Make sure to use –recursive parameter.

[desktop: test]:${PWD}> aws s3 cp s3://demo-beta/dw/user/dt=2019-07-30/ /tmp/dw/
fatal error: An error occurred (404) when calling the HeadObject operation: Key "dw/user/dt=2019-07-30/" does not exist

[desktop: test]:${PWD}> aws s3 cp s3://demo-beta/dw/user/dt=2019-07-30/ /tmp/dw/ --recursive
download: s3://demo-beta/dw/user/dt=2019-07-30/part-00002-fd866c-238-489-a44-739f1d04-c000.snappy.parquet to ../../../tmp/dw/part-00002-fd866c-238-489-a44-739f1d04-c000.snappy.parquet

From Documentation –

–recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.

Expanding array to multiple rows – Athena

Uncategorized
| Jun 13, 2019

A single row in Athena table is stored as —

select id, course, date from demo.course_tab where id='1234567892'

 	id	course	                                                                 date
 1234567892	[95c3c1bc5873, 2e345b2eb678, 027b02599f4a, 8695a580520b, 5d453355d415, cdcc7682070b]	2019-06-13

The datatype for course column is array(string). Now, how can you get the output in below format –

 	id	        course          date

1	1234567892	95c3c1bc5873	2019-06-13
2	1234567892	2e345b2eb678	2019-06-13
3	1234567892	027b02599f4a	2019-06-13
4	1234567892	8695a580520b	2019-06-13
5	1234567892	5d453355d415	2019-06-13
6	1234567892	cdcc7682070b	2019-06-13

(more...)

Identifying PostgreSQL Bloat and fixing it without downtime

Uncategorized
| Apr 20, 2019

Original Post can be viewed at Identifying PostgreSQL Bloat and fixing it without downtime

MVCC (Multi-Version Concurrency Control) feature allows databases to provide concurrent access to data. This allows each SQL statement to see a snapshot of data as it was some time ago, regardless of the current state of the underlying data. This prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same…

AskDba.org Weblog

Understanding Nested Lists Dictionaries of JSON in Python and AWS CLI

Uncategorized
| Apr 20, 2019

After lots of hair pulling, bouts of frustration, I was able to grasp this nested list and dictionary thingie in JSON output of AWS cli commands such as describe-db-instances and others. If you run the describe-db-instances for rds or describe-instances for ec2, you get a huge pile of JSON mumbo-jumpo with all those curly and square brackets studded with colons and commas. The output is heavily nested.


For example, if you do :

aws rds (more...)

S3 bucket size using aws cli

Uncategorized
| Apr 18, 2019

To get the total objects count and total size for folder within S3 bucket recursively –


aws s3 ls --summarize --human-readable --recursive s3://content-dev/mywork/dumb | grep 'Total'

Total Objects: 1922
   Total Size: 297.4 GiB

AWS: Accessing S3 buckets from the internet and from ec2 instances (2)

Uncategorized
| Apr 10, 2019
After accessing a S3 bucket from an ec2 instance by adding an IAM role to my EC2 instance, i tried to get the access from an ec2 instance by using a AWS endpoint.

First step: Launch an ec2 instance within your VPC and try to access your S3 bucket:

[ec2-user@ip-172-31-30-93 ~]$ wget https://s3-eu-west-1.amazonaws.com/my.webtest/website.json
--2019-02-02 18:29:28--  https://s3-eu-west-1.amazonaws.com/my.webtest/website.json
Auflösen des Hostnamen »s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)«... 52.218. (more...)

AWS CloudFormation Linter Installation and Quick Introduction

Uncategorized
| Apr 4, 2019
AWS Cloudformation is an evolving managed service which facilitates infrastructure as a code in the cloud. What it means is that you can create AWS resources like EC2 instances, S3 buckets and many more just by writing code instead of using GUI console.



For instances, if you want to create 100 EC2 instances for production, and then later you have to create same for development and for testing and then may be later on, you (more...)

Step by Step Troubleshooting and Validation of AWS CloudFormation Yaml template

Uncategorized
| Apr 1, 2019
CloudFormation (CFN) is infrastructure as a code service of AWS. You just tell CFN your desired state of resources and it creates them in order while resolving dependencies. You mention these resources in a file called as template which can be written in YAML or JSON. YAML being bit more user readable, is widely used now. YAML is great but you have to be aware of its indentation.


Following is a step by step troubleshooting (more...)

AWS: overview of all postings

Uncategorized
| Mar 29, 2019
After nearly an half year of looking into AWS, here a summary of all postings related to that topic:
ECS & S3 (Computing & Storage):

When AWS SCT Unable to Connect to MySQL in RDS or EC2

Uncategorized
| Mar 22, 2019
AWS Schema Conversion Tool (SCT) is one of the must tool for a successful migration of databases to AWS RDS.

If you are trying to connect to MySQL hosted on EC2 instance or RDS and unable to make the connection despite of setting the security groups correctly and making sure that port, hostname, username and password are right, then first check the log of SCT.




2019-03-22 19:40:16.866 [   1]     GENERAL INFO    global_settings:
(more...)