Explain (execution) plans and user experience - or - the OR-NULL-Bind problem

From time to time it happens that I discuss with other DBAs or clients about explain plans, execution times and why explain plans can be good and the user experience is bad - or vice versa.
One example can be found at my post about "Slow Autodesk Map" where I have shown that the execution time of a statement has a horrible execution time, while the execution plan is great. But the optimizer needs a lot of time to generate the execution plan.
Now, I was discussing about another thing where the execution plan itself may lead to a wrong assumption about plans and user experience / execution times.

The case:

We create a table with an index: 

create table t1_test  (col1 number); 
create index idx on t1_test(col1); 

and fill the table with some values:  
begin
    for i in 1..100 loop 
         insert into t1_test(col1) values(i); 
    end loop;
 end;
 /  
commit;

Don't forget to create the table statistics:

exec dbms_stats.gather_table_stats(user,'t1_test')  


The following select is the one that should be tuned:


select * from t1_test t 
    where (:B1 is null or t.col1 = :B1); 

As you can see, the interesting part is, what happens if the bind variable :B1 is null and what happens if the bind variable is used with a value not null.
The question was: Is there any chance that the database will not use a table access full (FTS=Full table scan)?

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 18 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1_TEST | 6 | 18 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(:B1 IS NULL OR "T"."COL1"=TO_NUMBER(:B1))

Someone said, if you create the following index, the database will always choose the index:

create index comm_i on emp(comm,'x'); 

We do the select again:
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 380 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| EMP | 10 | 380 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | COMM_I | 10 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------


And that really helped - as you can see at the above execution plan. But is that right under all cirumcstances? The question is also, is this the right and wanted behaviour? Do we really want the statement to be executed ALWAYS using the index?

The answer is NO. Because if the bind variable is null and we do have a hundreds of thousands or even millions of rows in the table, the index must be read for ALL entries and then ALL rows must be read by the Table Access by Index Rowid. So the database is doing a lot of additional work.
In addition, what will really happen if the table is filled up with some millions of rows and the statistics are recreated? Is this new (function-based!!) index really taken?

You can guess it: No, it isn't. The database is intelligent enough that it switches back to the FTS with 2.5 million rows...

This is what happens if :B1 is null:



And this happens if :B1 is not null:



Both ways therefore lead to a full table scan regardles of the second index - which gives the right user experience with :B1 is null but e.g. if :B1 is a value (like 5) the database should only select 5 rows using an index. The FTS is taking a long time instead and the user experience is bad.

There is no direct solution for this problem as you can't create just an index for that. The best solution is to rewrite the statement:

if :b1 is null then
    select * from t1_test t
else
    select * from t1_test t 
          where t.col1 = :B1; 
end if:
/

With this approach the FTS is done when the :B1 bind variable is null and an index access is done if the :B1 bind variable is not null.

But there is a second solution. The following change of the select statement will also work: 

SELECT *
      FROM T1_TEST T  

     WHERE :B1 IS NULL 
UNION ALL 
SELECT * 
     FROM T1_TEST T 
     WHERE T.COL1 = :B1;

To see the difference, we need to do a trace (with tkprof) and do have a look at the FETCH line.
The database is doing a full table scan if you execute both statements if the bind variable is null:

ORIGINAL STATEMENT:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    25001      0.57       0.59          0      28090          0     2500000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    25003      0.57       0.59          0      28090          0     2500000

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
2500000   TABLE ACCESS   MODE: ANALYZED (FULL) OF 'T1_TEST' (TABLE)

UNION ALL STATEMENT:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    25001      0.90       0.89          0      28090          0     2500000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    25003      0.90       0.89          0      28090          0     2500000

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
2500000   UNION-ALL
2500000    FILTER
2500000     TABLE ACCESS   MODE: ANALYZED (FULL) OF 'T1_TEST' (TABLE)
      0    INDEX   MODE: ANALYZED (RANGE SCAN) OF 'IDX' (INDEX)

Both statements are doing FTS and do have the same number of blocks read and rows selected (the difference in time is just a side effect of my environment). The index access range scan is NOT executed, even it is part of the union all select execution plan!

What happens if the bind variable is not null?
ORIGINAL SELECT:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.06       0.06          0       3848          0           5
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.06       0.06          0       3848          0           5

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 104  (TEST_EW)   (recursive depth: 1)
Number of plan statistics captured: 1
 Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      5   TABLE ACCESS   MODE: ANALYZED (FULL) OF 'T1_TEST' (TABLE)

We can see there are only 5 rows selected, but 3848 query counts! Now to the
UNION ALL SELECT:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          0          3          0           5
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00          0          3          0           5
 Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      5   UNION-ALL
      0    FILTER
      0     TABLE ACCESS   MODE: ANALYZED (FULL) OF 'T1_TEST' (TABLE)
      5    INDEX   MODE: ANALYZED (RANGE SCAN) OF 'IDX' (INDEX)

We do see again 5 rows are selected - but with only 3 as query count. This only can happen when the full table scan, which is part of the execution plan is NOT EXECUTED.

Conclusion:

Don't rely yourself on indexes or index access only or even say a Full Table Scan in an explain plan is evil. As we can see at this post, the UNION ALL select combines a full table scan and an index access at the execution plan, but the database EXECUTES only one of both sub-selects. As the database doesn't evaluates the bind variable (due to cursor sharing) while CREATING the execution plan, it does evaluate the bind variable BEFORE EXECUTING the plan - and "optimizes" the execution to one of the both sub-selects. With this in mind we can expect and we do see at the trace output that the user experience is best either with the if - else pl/sql or with the UNION ALL select.

What is faster - the pl/sql part or the UNION ALL? Well, if you execute PL/SQL with SQL the database needs to do context switches (the old rule says: If you can do it in SQL, do it in SQL, if you can't do it in SQL, do it in PL/SQL, if you can't do it in PL/SQL, do it in ...). So if you have a loop around such kind of select, you may spend a lot of time on context switches: the UNION ALL should be the first choice here.
Lesson learned: If you need to tune statements - don't have a look only at the execution plans. The right starting point for tuning is always the user experience.