Being generous to the optimizer

In a perfect world, the optimizer would reach out from the server room, say to us: “Hey, lets grab a coffee and have a chat about that query of yours”. Because ultimately, that is the task we are bestowing on the optimizer – to know what our intent was in terms of running a query in a way that meets the performance needs of our applications. It generally does a pretty good job even without (more...)

Hacking together faster INSERTs

Most developers tools out there have some mechanism to unload a table into a flat file, either as CSV, or Excel, and some even allow you to unload the data as INSERT statements. The latter is pretty cool because it’s a nice way of having a self-contained file that does not need Excel or DataPump or any tool additional to the one you’re probably using to unload the data.

SQLcl and SQL Developer are perhaps (more...)

HiveMall: Transform Categorical features to Numerical

HiveMall is a machine learning library that sits on top of Hive and provides SQL interface to wide range of data preparation and machine learning algorithms.

A common task faced for many machine learning exercises is to convert the data from the format it is captured in (raw data) into a format that is required by the machine learning algorithms. Most ML tools will either have functionality built into the algorithms to do this automatically (more...)

Top time-consuming predicates from ASH

Sometimes it might be useful to analyze top time-consuming filter and access predicates from ASH, especially in cases when db load is spread evenly enough by different queries and top segments doesn’t show anything special, except usual things like “some tables are requested more often than others”.
Of course, we can start from analysis of SYS.COL_USAGE$: col_usage.sql

col_usage.sql
col owner format a30
col oname format a30 heading "Object name"
col cname format  (more...)

Time Series Forecasting in Oracle – Part 2

This is the second part about time-series data modeling using Oracle. Check out the first part here.

In this post I will take a time-series data set and using the in-database time-series functions model the data, that in turn can be used for predicting future values and trends.

The data set used in these examples is the Rossmann Store Sales data set. It is available on Kaggle and was used in one of their competitions.

(more...)

Visualising SQL Analytics Rolling Count with OracleJET in APEX

Back in around 2005, before the time of smart phones, I had some data.

I can't remember what the data was, but I was told that for it to be valid, it should roughly form a bell curve.

Sure, I'm sure I could have aggregated it, exported it to Excel, and plotted it to a graph, but this was 2005. There was no SQL Developer where I could copy & paste the results directly into (more...)

My #OUGN19 Presentations

As promised, here are the two presentations I gave on the OUGN 2019 Boat conference. I had a great time with old friends and new, and managed to have discussions about politics without anyone getting upset! Hope to see some of you again at OUG Ireland!

Function based tables

You've probably seen this somewhere already, no doubt from Connor, though I couldn't find much beyond Tim's post on pipelined functions - I can't find the right keywords to find related content.

I like table functions, so this will help me remember we no longer need to specify the table() operator in 18c (12.2).

12c

select * 
from table(
apex_string.split('A,B,C',',')
);

Result Sequence
---------------
A
B
C

12c> select * from apex_string.split('A,B,C',',');

(more...)

Mystery and imagination of DELETE RETURNING INTO

Uncategorized
| Feb 7, 2019
Always check out the original article at http://www.oraclequirks.com for latest comments, fixes and updates. Am I the only one who finds very odd this statement in the documentation of DELETE ... RETURNING INTO? "returning_clause This clause lets you return values from deleted columns, and thereby eliminate the need to issue a SELECT statement following the DELETE statement." I mean,

Why I Don’t Like ANSI Joins

Since Oracle started supporting ANSI joins in 9i, the Oracle community was divided into two groups: one loved it and tried to convince everyone to use it, and the other hated it and has never switched. As you probably understand from the title, I’m in the second group, and this is why. When using a … Continue reading "Why I Don’t Like ANSI Joins"

Top-N Queries

This is an old story. It started in 2015 when I found a problem with TOP-N query performance. I wrote a blog post about it and later an update. Here I want to show the full testcase and some updates after a twitter discussion. Background I don’t want to spend too much time writing the … Continue reading "Top-N Queries"

ORA-22905 when calling ODCI Table functions from PL/SQL

Back in 2015 I experimented using ODCITable* functions to dynamically parse delimited text (see blog post here.)

Now blog reader Daniel Cabanillas points out, that it raises an error when used in PL/SQL. Silly me - I had only tried it out in SQL (my favorite language) and had completely missed that it failed in PL/SQL.

Lucky for me, the classic workaround of dynamic SQL works here too 😁


Look in the old blog (more...)

Securing Oracle scripts with a wallet

It’s that time of year when I try to clean up some of my bad habits from the year before. This time, it’s plain-text passwords in SQL scripts. Here’s a way to make them a little more secure. You can move all your passwords into a secure wallet file stored on your PC. This is not the most secure option available (and you should keep the wallet in a private location) but it is better (more...)

Splitting strings before they’re CLOBs!

After I tokenized two CLOBs in response to a forum question, Andy Sayer mentioned that they were probably files first. So I smacked my forehead and used external tables.

Splitting strings when they are CLOBs

Like many of you, I have enjoyed using SQL to make data processing go hundreds of times faster. My new record is: 7000 times faster :-)

Top-N again: fetch first N rows only vs rownum

Three interesting myths about rowlimiting clause vs rownum have recently been posted on our Russian forum:

  1. TopN query with rownum<=N is always faster than "fetch first N rows only" (ie. row_number()over(order by ...)<=N)
  2. “fetch first N rows only” is always faster than rownum<=N
  3. “SORT ORDER BY STOPKEY” stores just N top records during sorting, while “WINDOW SORT PUSHED RANK” sorts all input and stores all records sorted in memory.

Interestingly that after Vyacheslav posted first statement as an axiom and someone posted old tests(from 2009) and few people made own tests which showed that “fetch first N rows” is about 2-3 times faster than the query with rownum, the final decision was that “fetch first” is always faster.

First of all I want to show that statement #3 is wrong and “WINDOW SORT PUSHED RANK” with row_number works similarly as “SORT ORDER BY STOPKEY”:
It’s pretty easy to show using sort trace:
Let’s create simple small table Tests1 with 1000 rows where A is in range 1-1000 (just 1 block):

create table test1(a not null, b) as
  select level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS '10032 trace name context forever, level 10';

ALTER SESSION SET tracefile_identifier = 'rownum';
select * from (select * from test1 order by a) where rownum<=10;

ALTER SESSION SET tracefile_identifier = 'rownumber';
select * from test1 order by a fetch first 10 rows only;

And we can see from the trace files that both queries did the same number of comparisons:

rownum:
----- Current SQL Statement for this session (sql_id=bbg66rcbt76zt) -----
select * from (select * from test1 order by a) where rownum<=10

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            10
Total number of comparisons performed     999
  Comparisons performed by in-memory sort 999
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]
row_number

----- Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) -----
select * from test1 order by a fetch first 10 rows only

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            10
Total number of comparisons performed     999
  Comparisons performed by in-memory sort 999
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]

Ie. each row (except first one) was compared with the biggest value from top 10 values and since they were bigger than top 10 value, oracle doesn’t compare it with other TopN values.

And if we change the order of rows in the table both of these queries will do the same number of comparisons again:

from 999 to 0

create table test1(a not null, b) as
  select 1000-level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS '10032 trace name context forever, level 10';

ALTER SESSION SET tracefile_identifier = 'rownum';
select * from (select * from test1 order by a) where rownum<=10;


ALTER SESSION SET tracefile_identifier = 'rownumber';
select * from test1 order by a fetch first 10 rows only;

[collapse]
rownum

----- Current SQL Statement for this session (sql_id=bbg66rcbt76zt) -----
select * from (select * from test1 order by a) where rownum<=10

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            1000
Total number of comparisons performed     4976
  Comparisons performed by in-memory sort 4976
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]
row_number

----- Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) -----
select * from test1 order by a fetch first 10 rows only

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            1000
Total number of comparisons performed     4976
  Comparisons performed by in-memory sort 4976
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]

We can see that both queries required much more comparisons(4976) here, that’s because each new value is smaller than the biggest value from the topN and even smaller than lowest value, so oracle should get right position for it and it requires 5 comparisons for that (it compares with 10th value, then with 6th, 3rd, 2nd and 1st values from top10). Obviously it makes less comparisons for the first 10 rows.

Now let’s talk about statements #1 and #2:
We know that rownum forces optimizer_mode to switch to “first K rows”, because of the parameter “_optimizer_rownum_pred_based_fkr”

SQL> @param_ rownum

NAME                               VALUE  DEFLT  TYPE      DESCRIPTION
---------------------------------- ------ ------ --------- ------------------------------------------------------
_optimizer_rownum_bind_default     10     TRUE   number    Default value to use for rownum bind
_optimizer_rownum_pred_based_fkr   TRUE   TRUE   boolean   enable the use of first K rows due to rownum predicate
_px_rownum_pd                      TRUE   TRUE   boolean   turn off/on parallel rownum pushdown optimization

while fetch first/row_number doesn’t (it will be changed after the patch #22174392) and it leads to the following consequences:
1. first_rows disables serial direct reads optimization, that’s why the tests with big tables showed that “fetch first” were much faster than the query with rownum.
So if we set “_serial_direct_read”=always, we get the same performance in both tests (within the margin of error).

2. In cases when index access (index full scan/index range scan) is better, CBO differently calculates the cardinality of underlying INDEX FULL(range) SCAN:
the query with rownum is optimized for first_k_rows and the cardinality of index access is equal to K rows, but CBO doesn’t reduce cardinality for “fetch first”, so the cost of index access is much higher, compare them:

rownum
SQL> explain plan for
  2  select *
  3  from (select * from test order by a,b)
  4  where rownum<=10;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   390 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |            |       |       |            |          |
|   2 |   VIEW                        |            |    10 |   390 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M|     4   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |    10 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)

[collapse]
fetch first

SQL> explain plan for
  2  select *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    10 |   780 |       |  5438   (1)| 00:00:01 |
|*  1 |  VIEW                    |      |    10 |   780 |       |  5438   (1)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000K|    12M|    22M|  5438   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | TEST |  1000K|    12M|       |   690   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]
fetch first + first_rows

SQL> explain plan for
  2  select/*+ first_rows */ *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  1 |  VIEW                         |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  2 |   WINDOW NOSORT STOPKEY       |            |  1000K|    12M| 27376   (1)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M| 27376   (1)| 00:00:02 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |  1000K|       |  2637   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]
fetch first + index

SQL> explain plan for
  2  select/*+ index(test (a,b)) */ *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  1 |  VIEW                         |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  2 |   WINDOW NOSORT STOPKEY       |            |  1000K|    12M| 27376   (1)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M| 27376   (1)| 00:00:02 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |  1000K|       |  2637   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]

So in this case we can add hints “first_rows” or “index”, or install the patch #22174392.

ps. I thought to post this note later, since I hadn’t time enough to add other interesting details about the different TopN variants, including “with tie”, rank(), etc, so I’ll post another note with more details later.

Querying the amount of redo in Oracle Database

When a database starts to churn more redo then normal, it is handy to be able to extract the amount of redo over time, to be able to plot this in a graph.  The below allows you to extract this info 🙂

Query for redo generation

Query to obtain the amount of redo generation over time by hour and MB:

set pages 999 lines 400
select to_char(trunc(first_time, 'HH24'), 'DD/MM/YYYY HH24:MI:SS') date_by_hour, sum(round(blocks*block_size/1024/1024)) CHURN_IN_MB
from v$archived_log
 (more...)

Polymorphic Table Functions – Part 2

In my previous post I wrote about a possible use-case for Polymorphic Table Functions. I also wrote that I had some ideas to implement extra functionality to make the function more generic.
After attending DOAG Konferentz und ausstellung 2018 and UKOUG Tech18 I finally came around to publishing my new version.
Lets start off with a table to hold my data, I chose to use the NATO phonetic alphabet for this:

create table t_alphabet
(thecount  (more...)

Ranges with NULLs 07: Swiss Army Knife

I came up with this name a few years ago for a problem that I couldn't solve then: analyse a series of ranges and say whether and how they overlap. It turns out the solution is not that hard.

Lab Correction

Anyone using the August 2018 Fedora image should note that I neglected to put the right transaction_upload2.csv file in the /u01/app/oracle/upload directory. You can fix that by navigating to the Lab 12 Instructions web page and click on the zip file link to download the correct file. You will see the following dialog asking whether you want to open the file with the Ark utility, click OK to continue:

After clicking OK to open (more...)