Wednesday, July 7, 2010

How to Calculate Join Cardinality

How Calculate Join Cardinality

SQL> create table tab1 as select
2 trunc(dbms_random.value(0,100)) col1,
3 chr(trunc(dbms_random.value (97, 103))) col2
4 from dual connect by level <= 1000;
Table created.

Wrote file afiedt.buf
1 create table tab2 as select
2 trunc(dbms_random.value(0,1000)) col1,
3 chr(trunc(dbms_random.value (100, 103))) col2
4* from dual connect by level <= 10000 SQL>
/

Table created.

SQL>


Gather stats for both the tables with "For all columns size 1" with no histograms.

SQL> exec dbms_stats.gather_table_stats (ownname=>'SCOTT', tabname=>'TAB1',
method_opt=>'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats (ownname=>'SCOTT', tabname=>'TAB2',
method_opt=>'for all columns size 1');

PL/SQL procedure successfully completed.

SQL>

SQL> ed
Wrote file afiedt.buf

1 select count(*)
2 from
3 tab1,
4 tab2
5 where
6 tab1.col1 = tab2.col1 and
7* tab1.col2 in (chr(97), chr(98), chr(99))
SQL> /

COUNT(*)
----------
5318


Execution Plan
----------------------------------------------------------
Plan hash value: 2043035240

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 10 (10)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | HASH JOIN | | 5000 | 40000 | 10 (10)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| TAB1 | 500 | 2000 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TAB2 | 10000 | 40000 | 6 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TAB1"."COL1"="TAB2"."COL1")
3 - filter("TAB1"."COL2"='a' OR "TAB1"."COL2"='b' OR
"TAB1"."COL2"='c')


The forumula to calucate the Join cardinaliy as per reference from CBO Fundamentals and NoteId 68992.1 is..

Join Selectivity = 1 / max[(NDV(tab1.col1), NDV(tab2.col1)] *
((card tab1 - # tab1.col1 NULLS) / card tab1) *
((card tab2 - # tab2.col1 NULLS) / card tab2))


with the join cardinality then given as
Join Cardinality = card(tab1) * card(tab1) * sel(Pj)

Where
NDV: Number of Distinct Values
Cardinality: Number of rows
Selectivity: Proportion of a dataset returned by a particular predicate(or group of predicates)


Wrote file afiedt.buf

1 select TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS from user_tab_col_statistics
2* where column_name = 'COL1'
SQL> /

TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS
------------------------------ ------------------------------ ------------ ---------- -----------
TAB2 COL1 1000 0 1
TAB1 COL1 100 0 1

SQL>


So In our case
Join Selectivity = 1 / Max[1000,100] *
((500-0) / 500) *
((10000-0) / 10000) = 1/1000

Join Cardinality = 500 * 10000 * 1/1000 = 5000

If we match the calculated join cardinality with join cardinality in execution plan of the query it will be same.

References: Metalink Note ID: 68992.1 and Cost Based Oracle Fundamentals by Jonathan Lewis.

No comments: