Home > Uncategorized > Negativ probability

Negativ probability

Not really, but this is the first thing that came to my mind, when I looked at a query, where the optimizer calculated that “condition A AND condition B” produces more rows than “condition A OR condition B”.

See for yourself on Oracle 10.2.0.4:

create table test_sz_t
 (i number, 
  d number, 
  c number, 
  v varchar2(400));

insert into test_sz_t
with gen as (select rownum-1 rn from dual connect by level <=1e5)
select rn, mod(rn, 100), trunc(rn/1000), lpad('a',300,'x') from gen;
commit;

begin
    dbms_stats.gather_table_stats(user, 'TEST_SZ_T');
end;
/

explain plan set statement_id='or' for
select * from test_sz_t where 2=:a or (3=:b  and i > -1);
select cardinality or_card from  plan_table where id=0 and statement_id='or';

explain plan set statement_id='and' for
select * from test_sz_t where 2=:a and (3=:b  and i > -1);
select cardinality and_card from  plan_table where id=0 and statement_id='and';

and here the of the expected rowcounts output. I just read the first line of the plan to get the final rowcounts.

    OR_CARD
-----------
       1990

   AND_CARD
-----------
     100000

The i> -1 will not reduce the number of rows. In the OR case, assuming independent conditions, the probability is as expected, ProbA + ProbB – ProbA*ProbB. This corresponds to the probability 1% for the completely unknown conditions number=:a, eg 1%+1% – 1%*1%. If I had used number > :a, then 5% would have been used instead of 1%.
Now in the AND case, you should expect 1%*1% e.g. 10 rows, howver this is not the case. A higher rowcount is reported, and this is only possible with a negatve probability or an oddity/bug within the optimizer.

When you look a the full plans, I omitted above, you notice quite easily what is going on, however I have to say in the orginal statements it was not so clear at first, and unfortunate, as it happend early as part of a larger query.

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2387583053

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  1990 |   604K|  1034   (2)| 00:00:13 |
|*  1 |  TABLE ACCESS FULL| TEST_SZ_T |  1990 |   604K|  1034   (2)| 00:00:13 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter(TO_NUMBER(:A)=2 OR TO_NUMBER(:B)=3 AND "I">(-1))



PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2687592283

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |   100K|    29M|  1031   (2)| 00:00:13 |
|*  1 |  FILTER            |           |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| TEST_SZ_T |   100K|    29M|  1031   (2)| 00:00:13 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------

   1 - filter(TO_NUMBER(:B)=3 AND TO_NUMBER(:A)=2)
   2 - filter("I">(-1))

You see the condition now is evaluated as filter in the and case, and this filter had no influence on the cardinality at all.

Obviously this is a very simple and maybe unfair example, but it shows the issue I also faced for a more complex query. What was my “real life” problem? A historised rule table with two different sys_context and other conditions to select relevant rows.

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: