Splitting strings before they’re CLOBs!

After I tokenized two CLOBs in response to a forum question, Andy Sayer mentioned that they were probably files first. So I smacked my forehead and used external tables.

Splitting strings when they are CLOBs

Like many of you, I have enjoyed using SQL to make data processing go hundreds of times faster. My new record is: 7000 times faster :-)

Top-N again: fetch first N rows only vs rownum

Three interesting myths about rowlimiting clause vs rownum have recently been posted on our Russian forum:

  1. TopN query with rownum<=N is always faster than "fetch first N rows only" (ie. row_number()over(order by ...)<=N)
  2. “fetch first N rows only” is always faster than rownum<=N
  3. “SORT ORDER BY STOPKEY” stores just N top records during sorting, while “WINDOW SORT PUSHED RANK” sorts all input and stores all records sorted in memory.

Interestingly that after Vyacheslav posted first statement as an axiom and someone posted old tests(from 2009) and few people made own tests which showed that “fetch first N rows” is about 2-3 times faster than the query with rownum, the final decision was that “fetch first” is always faster.

First of all I want to show that statement #3 is wrong and “WINDOW SORT PUSHED RANK” with row_number works similarly as “SORT ORDER BY STOPKEY”:
It’s pretty easy to show using sort trace:
Let’s create simple small table Tests1 with 1000 rows where A is in range 1-1000 (just 1 block):

create table test1(a not null, b) as
  select level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS '10032 trace name context forever, level 10';

ALTER SESSION SET tracefile_identifier = 'rownum';
select * from (select * from test1 order by a) where rownum<=10;

ALTER SESSION SET tracefile_identifier = 'rownumber';
select * from test1 order by a fetch first 10 rows only;

And we can see from the trace files that both queries did the same number of comparisons:

rownum:
----- Current SQL Statement for this session (sql_id=bbg66rcbt76zt) -----
select * from (select * from test1 order by a) where rownum<=10

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            10
Total number of comparisons performed     999
  Comparisons performed by in-memory sort 999
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]
row_number

----- Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) -----
select * from test1 order by a fetch first 10 rows only

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            10
Total number of comparisons performed     999
  Comparisons performed by in-memory sort 999
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]

Ie. each row (except first one) was compared with the biggest value from top 10 values and since they were bigger than top 10 value, oracle doesn’t compare it with other TopN values.

And if we change the order of rows in the table both of these queries will do the same number of comparisons again:

from 999 to 0

create table test1(a not null, b) as
  select 1000-level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS '10032 trace name context forever, level 10';

ALTER SESSION SET tracefile_identifier = 'rownum';
select * from (select * from test1 order by a) where rownum<=10;


ALTER SESSION SET tracefile_identifier = 'rownumber';
select * from test1 order by a fetch first 10 rows only;

[collapse]
rownum

----- Current SQL Statement for this session (sql_id=bbg66rcbt76zt) -----
select * from (select * from test1 order by a) where rownum<=10

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            1000
Total number of comparisons performed     4976
  Comparisons performed by in-memory sort 4976
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]
row_number

----- Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) -----
select * from test1 order by a fetch first 10 rows only

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            1000
Total number of comparisons performed     4976
  Comparisons performed by in-memory sort 4976
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]

We can see that both queries required much more comparisons(4976) here, that’s because each new value is smaller than the biggest value from the topN and even smaller than lowest value, so oracle should get right position for it and it requires 5 comparisons for that (it compares with 10th value, then with 6th, 3rd, 2nd and 1st values from top10). Obviously it makes less comparisons for the first 10 rows.

Now let’s talk about statements #1 and #2:
We know that rownum forces optimizer_mode to switch to “first K rows”, because of the parameter “_optimizer_rownum_pred_based_fkr”

SQL> @param_ rownum

NAME                               VALUE  DEFLT  TYPE      DESCRIPTION
---------------------------------- ------ ------ --------- ------------------------------------------------------
_optimizer_rownum_bind_default     10     TRUE   number    Default value to use for rownum bind
_optimizer_rownum_pred_based_fkr   TRUE   TRUE   boolean   enable the use of first K rows due to rownum predicate
_px_rownum_pd                      TRUE   TRUE   boolean   turn off/on parallel rownum pushdown optimization

while fetch first/row_number doesn’t (it will be changed after the patch #22174392) and it leads to the following consequences:
1. first_rows disables serial direct reads optimization, that’s why the tests with big tables showed that “fetch first” were much faster than the query with rownum.
So if we set “_serial_direct_read”=always, we get the same performance in both tests (within the margin of error).

2. In cases when index access (index full scan/index range scan) is better, CBO differently calculates the cardinality of underlying INDEX FULL(range) SCAN:
the query with rownum is optimized for first_k_rows and the cardinality of index access is equal to K rows, but CBO doesn’t reduce cardinality for “fetch first”, so the cost of index access is much higher, compare them:

rownum
SQL> explain plan for
  2  select *
  3  from (select * from test order by a,b)
  4  where rownum<=10;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   390 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |            |       |       |            |          |
|   2 |   VIEW                        |            |    10 |   390 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M|     4   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |    10 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)

[collapse]
fetch first

SQL> explain plan for
  2  select *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    10 |   780 |       |  5438   (1)| 00:00:01 |
|*  1 |  VIEW                    |      |    10 |   780 |       |  5438   (1)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000K|    12M|    22M|  5438   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | TEST |  1000K|    12M|       |   690   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]
fetch first + first_rows

SQL> explain plan for
  2  select/*+ first_rows */ *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  1 |  VIEW                         |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  2 |   WINDOW NOSORT STOPKEY       |            |  1000K|    12M| 27376   (1)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M| 27376   (1)| 00:00:02 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |  1000K|       |  2637   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]
fetch first + index

SQL> explain plan for
  2  select/*+ index(test (a,b)) */ *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  1 |  VIEW                         |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  2 |   WINDOW NOSORT STOPKEY       |            |  1000K|    12M| 27376   (1)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M| 27376   (1)| 00:00:02 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |  1000K|       |  2637   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]

So in this case we can add hints “first_rows” or “index”, or install the patch #22174392.

ps. I thought to post this note later, since I hadn’t time enough to add other interesting details about the different TopN variants, including “with tie”, rank(), etc, so I’ll post another note with more details later.

Polymorphic Table Functions – Part 2

In my previous post I wrote about a possible use-case for Polymorphic Table Functions. I also wrote that I had some ideas to implement extra functionality to make the function more generic.
After attending DOAG Konferentz und ausstellung 2018 and UKOUG Tech18 I finally came around to publishing my new version.
Lets start off with a table to hold my data, I chose to use the NATO phonetic alphabet for this:

create table t_alphabet
(thecount  (more...)

Polymorphic Table Functions

I have been working on a presentation on Polymorphic Table Functions. During this time I was looking for a real use case for Polymorphic Table Functions. I came up with an example which is not very useful in real life, but very useful to explain the technique.
At my current job I came across a piece of code that I had to copy and adjust to fit the needs for that specific case. The idea (more...)

Chart your SQL direct with Apache Zeppelin Notebook

Do you want a notebook where you can write some SQL queries to a database and see the results either as a chart or table? 

As long as you can connect to the database with a username and have the JDBC driver, no need to transfer data into spreadsheets for analysis, just download (or docker) and use Apache Zeppelin notebook and chart your SQL directly! 


I was impressed by the ability of Apache (more...)

ODC Appreciation Day : Pattern Matching in SQL

Here’s my contribution to the ODC Appreciation Day.

Pattern Matching in SQL, using the MATCH_RECOGNIZE clause, is one of my favorite features, but only recently I’ve used it “for real”.
MATCH_RECOGNIZE allows us to perform enhanced analysis of row sequences, and to detect sequences that match complex patterns.
This feature gave a significant boost to the analytical capabilities of SQL. It enables solving various types of problems in a simpler way than before, in much (more...)

List of Bank Holidays For England in SQL Format

First off, create a table to store the bank holiday values. You may need to adjust this slightly depending on your SQL server technology being used (this was tested on MySQL Server).  [crayon-5bb4e0b3b1755979540269/] Insert the bank holiday values below. This table is currently for 2012 up to 2019 for England and Wales.  [crayon-5bb4e0b3b175d512369946/] For a

MySQL/ MariaDB Error Code: 1329. No data – zero rows fetched, selected, or processed

The above error can occur when calling a cursor results in no rows, or no more rows if called in a loop. Whilst the error message is descriptive about what has happened physically, you may wish to catch the error so that you can do something else, or simply replace the generic database error with

How to exit from a MySQL/ MariaDB Stored Procedure/ Function Prematurely

MySQL and MariaDB enable you to define your own error conditions and to report back to the SQL client both a return code and an error message. As soon as you raise the condition then MySQL/ MariaDB will halt any further execution of the code and report the error back to the client. This can

Remove duplicate from APEX collection

One of my favourite SQL analytic functions is row_number(), and I've used it in the past to identify, then remove duplicates.

In this case, I have an APEX collection that represents a session based view history of products/people/events, or whatever your users might be browsing.

I've created an option to consolidate that view history, and remove any record you might have opened more than once.

Collections are a little hard to play with outside (more...)

ORA-01723 für DATE-Angaben

Nach langer Zeit mal wieder etwas Selbsterlebtes, das einen Eintrag zu verdienen scheint - wie man eine CTAS-Operation über das nls_date_format torpedieren kann:

SQL> alter session set nls_date_format = 'dd.mm.yyyy hh24:mi:ss';

Session altered.

SQL> create table t as select to_date(null) col1 from dual;
create table t as select to_date(null) col1 from dual
*
ERROR at line 1:
ORA-01723: zero-length columns are not allowed

SQL> alter session set nls_date_format = 'DD-MON-RR';

SQL> create table (more...)

JSON, BLOB column and Check Constraint

Last week I attended Neil Chandler's session on JSON during the POUG conference in beautiful Sopot, Poland.
The JSON Developer's Guide recommends using BLOB for storing data, and this is what Neil also recommended.
I was under the (FALSE!) impression that it was not possible to put an IS JSON check constraint on a BLOB column, simply because I tried once and got an exception. After the session I asked Neil if this was (more...)

Index Hints and Distributed Queries

There are two ways to specify indexes in optimizer hints: by the index name, or by the index’s (leading) columns:

The latter is usually preferred, as the writer’s intention is clearer, and it’s immune to changes of the index name.

Note: one can present the opposite argument, that specifying the index name is immune to changes of column names

Recently I’ve used the option that I prefer – specifying the column names and not the (more...)

Translating Chinese to English in SQL with Microsoft Translator

In Oracle, I had a table of data which had some Chinese words that I needed to translate into English on the fly, in SQL…this is how I did that…

Microsoft have a translator facility here with the Translator Text API v3.0 to allow you to call it programmatically. I’m using Microsoft as I’m currently working on Azure – of course, there are other translation facilities available.

The API has a translate method which one (more...)

The importance of being aliased

Uncategorized
| Jul 20, 2018
Always check out the original article at http://www.oraclequirks.com for latest comments, fixes and updates. I was writing a query returning the indexes to be monitored for a list of schemas for those tables having fewer than 5 indexes (not counting LOB indexes) and I was surprised to see that it returned no rows. After checking what's wrong I realized that I had specified the wrong column for

EBR – Part 10: Data Dictionary Views for Editioning Views

This is part 10 of a post series about Oracle Edition-Based Redefinition.

Visit the index page for all the parts of the series

Introduction

In a previous post I wrote about Editioning Views and their significant role in online application upgrades using EBR.
In this post we’ll see how editioning views are represented in the data dictionary views.

I’ll discuss only the USER_* views, but everything is true for the corresponding ALL_*, DBA_* and CDB_* (more...)

ANSI dates make life easier

This post is one of a series on what I learned while not at Kscope18.

Dimitri mentioned that he learned about the ANSI date format that allows you to return a date with the expression.


Which means this
date '2018-06-10'

Is the same as
to_char('10-Jun-2018','DD-MON-YYYY')

And you'll never want to type (more...)

Wrong Results with IOT, Added Column and Secondary Index

I found a “wrong results” bug yesterday, easily reproduced in 11g, 12c and 18c.

In short, we may get wrong results under the following circumstances:

  • We have an Index-Organized Table (IOT) with multi-column primary key, populated with rows
  • The table has a secondary index on part of the primary key columns
  • We add another column to the existing IOT
  • We select from the IOT while accessing it via the secondary index

Following is a simple (more...)

Long-running delete of synopses during table stats gathering

Oracle 11.2.0.4

I recently encountered one long-running session and several blocked sessions, all performing the same delete operation.

--SQL_ID dsf21kcbjqfyk
DELETE FROM SYS.WRI$_OPTSTAT_SYNOPSIS$ WHERE BO# = :B2 AND INTCOL# = :B1;

One session had been running for several hours, and the other sessions were all blocked by a row lock from the first session. Unfortunately, each of these sessions was trying to gather stats on new partitions of tables as part (more...)