July 2010 ~ Gurjas's Oracle Blog

Wednesday, July 7, 2010

String to rows

SET SERVEROUTPUT ON
DECLARE
  l_list1   VARCHAR2(50) := 'A,B,C,D,E,F,G,H,I,J';
  l_list2   VARCHAR2(50);
  l_tablen  BINARY_INTEGER;
  l_tab     DBMS_UTILITY.uncl_array;
BEGIN
  DBMS_OUTPUT.put_line('l_list1 : ' || l_list1);

  DBMS_UTILITY.comma_to_table (
     list   => l_list1,
     tablen => l_tablen,
     tab    => l_tab);

  FOR i IN 1 .. l_tablen LOOP
    DBMS_OUTPUT.put_line(i || ' : ' || l_tab(i));
  END LOOP;

  DBMS_UTILITY.table_to_comma (
     tab    => l_tab,
     tablen => l_tablen,
     list   => l_list2);

  DBMS_OUTPUT.put_line('l_list2 : ' || l_list2);
END;
/

INDEX RANGE SCAN will be skipped if first character in the string to be search will be wildcard


SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> exec dbms_stats.gather_table_stats('SCOTT','EMP',cascade=>TRUE);

PL/SQL procedure successfully completed.

First of all we are testing all the index scan without using any wild card as a first character.


SQL> set autot traceonly;
SQL> select * from emp where ename LIKE 'S%OT%T%%';


Execution Plan
----------------------------------------------------------
Plan hash value: 3307655913

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     7 |   273 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP           |     7 |   273 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | EMP_ENAME_IDX |     7 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ENAME" LIKE 'S%OT%T%%')
       filter("ENAME" LIKE 'S%OT%T%%')


Statistics
----------------------------------------------------------
        165  recursive calls
          0  db block gets
         30  consistent gets
          0  physical reads
          0  redo size
        826  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

Now testing using first character a wild card character.

    
SQL> set autot traceonly
SQL>  select * from emp where ename LIKE '%S%OT%T%%';


Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    39 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |    39 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ENAME" LIKE '%S%OT%T%%')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        822  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

It seems that whenever we are using a willcard ('%', '_') as a first character then it is skipping index scan becuase Wildchad is making the string to be search uncertain.
Let do one more experiment. Now we hide the wild card character with escape. like


SQL>  select * from emp where ename LIKE '/_S%OT%T%%' escape '/';

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 3307655913

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     7 |   273 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP           |     7 |   273 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | EMP_ENAME_IDX |     7 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ENAME" LIKE '/_S%OT%T%%' ESCAPE '/')
       filter("ENAME" LIKE '/_S%OT%T%%' ESCAPE '/')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        656  bytes sent via SQL*Net to client
        370  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL>

CSSCAN character set scanner utilities

Basically 2 type of character set are available in Oracle. Single byte character sets and Multi-bytes character sets. Whenever we will perform character set migration from single bytes character sets to Multi-bytes character set then depending on the data size truncation can be occurred. For that case we need to increase the size of those columns. To find all those culprit columns we can use CSSCAN utility.
CSSCAN Utility Usage:
First of all we need to install CSSCAN utility. If CSSCAN utility is not installed in database the it will give following error.


C:\>
C:\>CSSCAN \"sys/*****@orcl as sysdba\" FULL=Y
Character Set Scanner v2.0 : Release 10.1.0.4.0 - Production on Wed Feb 24 13:53:44 2010
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.5.0 - Production
With the Partitioning, OLAP and Data Mining options
CSS-00107: Character set migration utility schema not installed
Scanner terminated unsuccessfully.
C:\>

If CSMIG schema is not installed then we have to run the "ORACLE_HOME/rdbms/admin/csminst.sql" as sys in sql*plus.


Character Set Scanner v2.0 : Release 10.1.0.4.0 - Production on Wed Feb 24 14:06:57 2010
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.5.0 - Production
With the Partitioning, OLAP and Data Mining options
Current database character set is AL32UTF8.
Enter new database character set name: > WE8ISO8859P1
Enter array fetch buffer size: 102400 > 
Enter number of scan processes to utilize(1..32): 1 > 32

Enumerating tables to scan...
. process 1 scanning SYS.SOURCE$[AAAABHAABAAAP6JAAA]
. process 8 scanning SYS.TAB$[AAAAACAABAAAAAZAAA]
. process 22 scanning SYS.CLU$[AAAAACAABAAAAAZAAA]
. process 24 scanning SYS.OPQTYPE$[AAAAACAABAAAAAZAAA]
. process 5 scanning SYS.ICOL$[AAAAACAABAAAAAZAAA]
. process 1 scanning SYS.LIBRARY$[AAAAACAABAAAAAZAAA]
..
..
..
..
. process 9 scanning SYS.VIEW$[AAAAA+AABAAAAHBAAA]
. process 14 scanning WK_TEST.DR$WK$DOC_PATH_IDX$I[AAAPUOAADAAAE2JAAA]
. process 17 scanning SYSMAN.MGMT_E2E_JDBC_1HOUR[AAAP2JAADAAAFPxAAA]
. process 3 scanning SYSMAN.MGMT_E2E_JDBC_1DAY[AAAP2KAADAAAFP5AAA]
. process 1 scanning SYSMAN.MGMT_BLACKOUTS[AAAP2QAADAAAFQpAAA]
. process 10 scanning SYS.USER$[AAAAAKAABAAAABZAAA]


Creating Database Scan Summary Report...
Creating Individual Exception Report...
Scanner terminated successfully.

While scanning utility will create 32 session as we have provide value 32 for 
"Enter number of scan processes to utilize(1..32): 1 >"  parameter.  
As
SQL> SELECT COUNT(*) FROM V$SESSION WHERE PROGRAM = 'csscan.exe';

  COUNT(*)
----------
        32

After completing the scan it will created 3 different files as out in file system.

Scan.txt : Database Scan Summary Report
Scan.err : Database Scan Individual Exception Report (It will contain all rows which will be truncated with its table name, cell data and rowids )
Scan.out : Scan log

CSALTER Utility to alter Database Character set

Basically two methods are available for migrating character set.

- Using Import and Export Utilities
- Using CSALTER Utility

The CSALTER script is part of the Database Character Set Scanner utility. The CSALTER script is the most straightforward way to migrate a character set, but it can be used only if all of the schema data is a strict subset of the new character set. The new character set is a strict superset of the current character set i.e. Each and every character in the current character set is available in the new character set.

CSALTER Utility will only handle CLOBs which are a part of Data Dictionary and will not handle user created CLOB.
For CSALTER Utility database Instance should be opend as restricted mode and "CLUSTER_DATABASE=FALSE" i.e. only open instance should be up in case of RAC.


SQL> SHUTDOWN IMMEDIATE
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP RESTRICT
ORACLE instance started.

Total System Global Area  289406976 bytes
Fixed Size                  1248600 bytes
Variable Size              83886760 bytes
Database Buffers          201326592 bytes
Redo Buffers                2945024 bytes
Database mounted.
Database opened.

SQL> @@csalter.plb

0 rows created.

Function created.

Function created.

Procedure created.


This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y

old   6:     if (UPPER('&conf') <> 'Y') then
new   6:     if (UPPER('Y') <> 'Y') then

Checking data validility...

begin converting system objects

PL/SQL procedure successfully completed.

Alter the database character set...

CSALTER operation completed, please restart database

PL/SQL procedure successfully completed.

0 rows deleted.

Function dropped.

Function dropped.

Procedure dropped.

SQL>

If there will be any conversion problems then the process will report the problem and clean itself up without performing the conversion.

How to Calculate Join Cardinality

How Calculate Join Cardinality


SQL> create table tab1 as select
 2                           trunc(dbms_random.value(0,100)) col1,
 3                           chr(trunc(dbms_random.value (97, 103))) col2
 4                       from dual connect by level <= 1000;  
Table created.  

Wrote file afiedt.buf    
1  create table tab2 as select   
2                       trunc(dbms_random.value(0,1000)) col1,   
3                       chr(trunc(dbms_random.value (100, 103))) col2   
4*                    from dual connect by level <= 10000 SQL> 
/

Table created.

SQL>

Gather stats for both the tables with "For all columns size 1" with no histograms.


SQL> exec dbms_stats.gather_table_stats (ownname=>'SCOTT', tabname=>'TAB1',
 method_opt=>'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats (ownname=>'SCOTT', tabname=>'TAB2', 
method_opt=>'for all columns size 1');

PL/SQL procedure successfully completed.

SQL>

SQL> ed
Wrote file afiedt.buf

 1  select count(*)
 2  from
 3     tab1,
 4     tab2
 5  where
 6     tab1.col1 = tab2.col1 and
 7*    tab1.col2 in (chr(97), chr(98), chr(99))
SQL> /

 COUNT(*)
----------
     5318


Execution Plan
----------------------------------------------------------
Plan hash value: 2043035240

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |     8 |    10  (10)| 00:00:01 |
|   1 |  SORT AGGREGATE     |      |     1 |     8 |            |          |
|*  2 |   HASH JOIN         |      |  5000 | 40000 |    10  (10)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| TAB1 |   500 |  2000 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| TAB2 | 10000 | 40000 |     6   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - access("TAB1"."COL1"="TAB2"."COL1")
  3 - filter("TAB1"."COL2"='a' OR "TAB1"."COL2"='b' OR
             "TAB1"."COL2"='c')

The forumula to calucate the Join cardinaliy as per reference from CBO Fundamentals and NoteId 68992.1 is..

Join Selectivity = 1 / max[(NDV(tab1.col1), NDV(tab2.col1)] *
((card tab1 - # tab1.col1 NULLS) / card tab1) *
((card tab2 - # tab2.col1 NULLS) / card tab2))

with the join cardinality then given as
Join Cardinality = card(tab1) * card(tab1) * sel(Pj)

Where
NDV: Number of Distinct Values
Cardinality: Number of rows
Selectivity: Proportion of a dataset returned by a particular predicate(or group of predicates)


Wrote file afiedt.buf

 1  select  TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS from  user_tab_col_statistics
 2* where column_name  = 'COL1'
SQL> /

TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS NUM_BUCKETS
------------------------------ ------------------------------ ------------ ---------- -----------
TAB2                           COL1                                   1000          0           1
TAB1                           COL1                                    100          0           1

SQL>

So In our case
Join Selectivity = 1 / Max[1000,100] *
((500-0) / 500) *
((10000-0) / 10000) = 1/1000

Join Cardinality = 500 * 10000 * 1/1000 = 5000

If we match the calculated join cardinality with join cardinality in execution plan of the query it will be same.

References: Metalink Note ID: 68992.1 and Cost Based Oracle Fundamentals by Jonathan Lewis.