Top-N again: fetch first N rows only vs rownum

Three interesting myths about rowlimiting clause vs rownum have recently been posted on our Russian forum:

  1. TopN query with rownum<=N is always faster than "fetch first N rows only" (ie. row_number()over(order by ...)<=N)
  2. “fetch first N rows only” is always faster than rownum<=N
  3. “SORT ORDER BY STOPKEY” stores just N top records during sorting, while “WINDOW SORT PUSHED RANK” sorts all input and stores all records sorted in memory.

Interestingly that after Vyacheslav posted first statement as an axiom and someone posted old tests(from 2009) and few people made own tests which showed that “fetch first N rows” is about 2-3 times faster than the query with rownum, the final decision was that “fetch first” is always faster.

First of all I want to show that statement #3 is wrong and “WINDOW SORT PUSHED RANK” with row_number works similarly as “SORT ORDER BY STOPKEY”:
It’s pretty easy to show using sort trace:
Let’s create simple small table Tests1 with 1000 rows where A is in range 1-1000 (just 1 block):

create table test1(a not null, b) as
  select level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS '10032 trace name context forever, level 10';

ALTER SESSION SET tracefile_identifier = 'rownum';
select * from (select * from test1 order by a) where rownum<=10;

ALTER SESSION SET tracefile_identifier = 'rownumber';
select * from test1 order by a fetch first 10 rows only;

And we can see from the trace files that both queries did the same number of comparisons:

rownum:
----- Current SQL Statement for this session (sql_id=bbg66rcbt76zt) -----
select * from (select * from test1 order by a) where rownum<=10

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            10
Total number of comparisons performed     999
  Comparisons performed by in-memory sort 999
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]
row_number

----- Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) -----
select * from test1 order by a fetch first 10 rows only

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            10
Total number of comparisons performed     999
  Comparisons performed by in-memory sort 999
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]

Ie. each row (except first one) was compared with the biggest value from top 10 values and since they were bigger than top 10 value, oracle doesn’t compare it with other TopN values.

And if we change the order of rows in the table both of these queries will do the same number of comparisons again:

from 999 to 0

create table test1(a not null, b) as
  select 1000-level, level from dual connect by level<=1000;

alter session set max_dump_file_size=unlimited;
ALTER SESSION SET EVENTS '10032 trace name context forever, level 10';

ALTER SESSION SET tracefile_identifier = 'rownum';
select * from (select * from test1 order by a) where rownum<=10;


ALTER SESSION SET tracefile_identifier = 'rownumber';
select * from test1 order by a fetch first 10 rows only;

[collapse]
rownum

----- Current SQL Statement for this session (sql_id=bbg66rcbt76zt) -----
select * from (select * from test1 order by a) where rownum<=10

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            1000
Total number of comparisons performed     4976
  Comparisons performed by in-memory sort 4976
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]
row_number

----- Current SQL Statement for this session (sql_id=duuy4bvaz3d0q) -----
select * from test1 order by a fetch first 10 rows only

---- Sort Statistics ------------------------------
Input records                             1000
Output records                            1000
Total number of comparisons performed     4976
  Comparisons performed by in-memory sort 4976
Total amount of memory used               2048
Uses version 1 sort
---- End of Sort Statistics -----------------------

[collapse]

We can see that both queries required much more comparisons(4976) here, that’s because each new value is smaller than the biggest value from the topN and even smaller than lowest value, so oracle should get right position for it and it requires 5 comparisons for that (it compares with 10th value, then with 6th, 3rd, 2nd and 1st values from top10). Obviously it makes less comparisons for the first 10 rows.

Now let’s talk about statements #1 and #2:
We know that rownum forces optimizer_mode to switch to “first K rows”, because of the parameter “_optimizer_rownum_pred_based_fkr”

SQL> @param_ rownum

NAME                               VALUE  DEFLT  TYPE      DESCRIPTION
---------------------------------- ------ ------ --------- ------------------------------------------------------
_optimizer_rownum_bind_default     10     TRUE   number    Default value to use for rownum bind
_optimizer_rownum_pred_based_fkr   TRUE   TRUE   boolean   enable the use of first K rows due to rownum predicate
_px_rownum_pd                      TRUE   TRUE   boolean   turn off/on parallel rownum pushdown optimization

while fetch first/row_number doesn’t (it will be changed after the patch #22174392) and it leads to the following consequences:
1. first_rows disables serial direct reads optimization, that’s why the tests with big tables showed that “fetch first” were much faster than the query with rownum.
So if we set “_serial_direct_read”=always, we get the same performance in both tests (within the margin of error).

2. In cases when index access (index full scan/index range scan) is better, CBO differently calculates the cardinality of underlying INDEX FULL(range) SCAN:
the query with rownum is optimized for first_k_rows and the cardinality of index access is equal to K rows, but CBO doesn’t reduce cardinality for “fetch first”, so the cost of index access is much higher, compare them:

rownum
SQL> explain plan for
  2  select *
  3  from (select * from test order by a,b)
  4  where rownum<=10;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   390 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |            |       |       |            |          |
|   2 |   VIEW                        |            |    10 |   390 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M|     4   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |    10 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)

[collapse]
fetch first

SQL> explain plan for
  2  select *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    10 |   780 |       |  5438   (1)| 00:00:01 |
|*  1 |  VIEW                    |      |    10 |   780 |       |  5438   (1)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000K|    12M|    22M|  5438   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | TEST |  1000K|    12M|       |   690   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]
fetch first + first_rows

SQL> explain plan for
  2  select/*+ first_rows */ *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  1 |  VIEW                         |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  2 |   WINDOW NOSORT STOPKEY       |            |  1000K|    12M| 27376   (1)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M| 27376   (1)| 00:00:02 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |  1000K|       |  2637   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]
fetch first + index

SQL> explain plan for
  2  select/*+ index(test (a,b)) */ *
  3  from test
  4  order by a,b
  5  fetch first 10 rows only;

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  1 |  VIEW                         |            |    10 |   780 | 27376   (1)| 00:00:02 |
|*  2 |   WINDOW NOSORT STOPKEY       |            |  1000K|    12M| 27376   (1)| 00:00:02 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST       |  1000K|    12M| 27376   (1)| 00:00:02 |
|   4 |     INDEX FULL SCAN           | IX_TEST_AB |  1000K|       |  2637   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."A","TEST"."B")<=10)

[collapse]

So in this case we can add hints “first_rows” or “index”, or install the patch #22174392.

ps. I thought to post this note later, since I hadn’t time enough to add other interesting details about the different TopN variants, including “with tie”, rank(), etc, so I’ll post another note with more details later.

Counting Rows

Here’s another little utility I use from time to time (usually for small tables) to check how many rows there are in each block of the table, and which blocks are used. It doesn’t do anything clever, just call routines in the dbms_rowid package for each rowid in the table:


rem
rem     Rowid_count.sql
rem     Generic code to count rows per block in a table
rem     Ordered by file and block
rem

define m_table =  (more...)

This Week in PostgreSQL – May 31

Since last October I’ve been periodically writing up summaries of interesting content I see on the internet related to PostgreSQL (generally blog posts). My original motivation was just to learn more about PostgreSQL – but I’ve started sharing them with a few colleagues and received positive feedback.  Thought I’d try posting one of these digests here on the Ardent blog – who knows, maybe a few old readers will find it interesting? Here’s the (more...)

asmcmd “connected to an idle instance” – not

This is more a note to myself in case I’ll encounter a similar environment. But maybe it helps others – at least my search results weren’t suitable to Windows in the first place. Issue C:\> set ORACLE_HOME=C:\path\to\grid\home C:\> set ORACLE_SID=+ASM1 C:\> asmcmd connected to an idle instance. Environment Windows 2012R2 Oracle Grid Infrastructure 12.1.0.1 2 […]

“Collection iterator pickler fetch”: pipelined vs simple table functions

Alex R recently discovered interesting thing: in SQL pipelined functions work much faster than simple non-pipelined table functions, so if you already have simple non-pipelined table function and want to get its results in sql (select * from table(fff)), it’s much better to create another pipelined function which will get and return its results through PIPE ROW().

A bit more details:

Assume we need to return collection “RESULT” from PL/SQL function into SQL query “select (more...)

Oracle issues after upgrade to 12.2

Sometimes it’s really hard even to create reproducible test case to send it to oracle support, especially in case of intermittent errors.
In such cases, I think it would be really great to have access to similar service requests or bugs of other oracle clients.
So while my poll about knowledge sharing is still active, I want to share a couple of bugs we have faced after upgrade to 12.2 (and one bug from (more...)

Garbled display while running FMW installer on Linux

A colleague faced this while running FMW installer on a Linux machine. The display appeared like this

 

 

 

 

 

 

 

This thread gave a clue that it could have something to do with fonts. So I checked what all fonts related stuff was installed.

[root@someserver ~]# rpm -aq |grep -i font
stix-fonts-1.1.0-5.el7.noarch
xorg-x11-font-utils-7.5-20.el7.x86_64
xorg-x11-fonts-cyrillic-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-1-75dpi-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-9-100dpi-7.5-9.el7. (more...)

root.sh fails with CRS-2101:The OLR was formatted using version 3

Got this while trying to install 11.2.0.4 RAC on Redhat Linux 7.2. root.sh fails with a message like

ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2017-11-09 15:43:37.883:
[client(37246)]CRS-2101:The OLR was formatted using version 3.

This is bug 18370031. Need to apply the patch before running root.sh.

Easy(lazy) way to check which programs have properly configured FetchSize

select 
   s.module
  ,ceil(max(s.rows_processed/s.fetches)) rows_per_fetch
from v$sql s
where 
    s.rows_processed>100
and s.executions    >1
and s.fetches       >1
and s.module is not null
and s.command_type  = 3    -- SELECTs only
and s.program_id    = 0    -- do not account recursive queries from stored procs
and s.parsing_schema_id!=0 -- <> SYS
group by s.module
order by rows_per_fetch desc nulls last
/

TNS-12543: TNS:destination host unreachable

Scenario : Setting up a physical standby from Exadata to a non-Exadata single instance. tnsping from standby to primary works fine but tnsping from primary to standby fails with:

TNS-12543: TNS:destination host unreachable

I am able to ssh standby from primary, can ping as well but tnsping doesn’t work.  From the error description we can figure out that something is blocking the access. In this case it was iptables that was enabled on the standby (more...)

JDBC, Linux and Entropy

Some troubles — especially those happening only sporadically — are not so easy to shoot and call for a deeper understanding of the matter. In the following real-world example this means: SQL*Net Tracing and some knowledge about the inner workings of the server’s operating system, particularly random number generation. This case was suited well to demonstrate an approach to trouble-shoot […]

Exadata Upgrade to 12.2.0.1 – The Missing Step

I decided this week to be a little brave and upgrade one of the Enkitec Exadata racks to 12.2.0.1.  I installed the 12.2.1.0.0 Exadata image a few weeks ago, and have been waiting for a chance to upgrade clusterware to 12.2.  Thankfully, Oracle provides a very good note for this, but I did hit one large snag that should be documented.

The process for upgrading GI to 12.2 (more...)

Exadata Cell Patching – USB I/O Errors

I was playing around with the Exadata X2-2 in the Enkitec lab this weekend, and hit an interesting issue when patching the storage servers.  We were taking the system up to version 12.1.2.3.3 for testing purposes.  I fired off the patchmgr script, and one of the storage servers failed when beginning the first phase of the patching cycle:

[root@enkdb03 patch_12.1.2.3.3.161208]# ./patchmgr -cells cell_group -patch -ignore_alerts
 (more...)

Reduce Hard Parse time using SQL Profile

Few days ago we had concurrency problem with "cursor: pin S wait on X" wait event. This wait event is mostly associated with parsing in some form.

After quick diagnosis I’ve found problematic query. It was fairly complex query which was executed very often with average 0.20 seconds of execution time. As this query was using bind variables, Oracle reused existing plan and problems with "cursor: pin S wait on X" wait (more...)

ORA-38301: can not perform DDL/DML over objects in Recycle Bin

           While dropping a tablespace, getting error ORA-38301: can not perform DDL/DML over objects in Recycle Bin .

SQL> drop tablespace DATA including contents and datafiles;
drop tablespace DATA including contents and datafiles
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-38301: can not perform DDL/DML over objects in Recycle Bin

 

Solution:

Before dropping the tablespace, disable recyclbin . 

1. Disable recyclebin:

SQL> ALTER SYSTEM SET recyclebin (more...)

RMAN-20005: target database name is ambiguous

                          While running rman commands from the catalog database, you might get the error RMAN-20005: target database name is ambiguous. , if the database was restored from another database.

ERROR:

rman target sys/oracle#51234@RMANBKP catalog rman_b2ccrmpd/rman@catdb

Recovery Manager: Release 12.1.0.2.0 – Production on Fri Oct 14 10:34:05 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

connected to (more...)

Slow full table scan due to row chaining

Few days ago I’ve received complaint that simple count on 2 million rows table is running forever.

This was the statement:

select count(1)
from CLIENT k
where k.expires is null;

I've used fake names for table name and columns.
Database version: 11.2.0.4.0

Indeed, query was running longer than I would expect. Oracle was using FULL SCAN of the table with "db file sequential read" wait events. This was little odd (more...)

Extended Stack Profiling – Ideas, Tools and Comments

Topic: This post provides a short summary and pointers to previous work on Extended Stack Profiling for troubleshooting and performance investigations.

Understanding the workload is an important part of troubleshooting activities. We seek answers to questions like: what is the system doing, where is the time spent, which code paths are most used, what are the wait events, etc. Sometimes the relevant diagnostic data is easy to find, other times we need to dig (more...)

Analyze database activity using v$log_history

The v$log_history view contains important information on how application’s users use the database , this view can help you define periods with the most activity in the database. v$log_history queries You can adapt the query to your needs, you just have to change the way you format the date to be able to drilldown to the … Continue reading Analyze database activity using v$log_history

The post Analyze database activity using v$log_history appeared first on Oracle (more...)

ORA-19599 block corruption when filesystemio_options=SETALL on ext4 file system using Linux

Few days ago I experienced strange issue in my development environment running on OEL 5.8 with EXT4 filesystem. Note - EXT4 filesystem is supported from OEL 5.6 version.

This was virtual machine running oldish 10.2.0.5.0 Oracle database.

I noticed that backup for my database is failing because of archive log corruption. As this is development database I've simply deleted corrupted archive logs and initiated full backup again. But backup (more...)