2011-06-09

Database Character Set Conversion

Environment
Oracle 10.2.0.3, AIX 5.3

Requirements
Change database character set from AL32UTF8 to WE8ISO8859P1.

To see the current set:
select * from database_properties where property_name='NLS_CHARACTERSET';
select * from nls_database_parameters where parameter='NLS_CHARACTERSET';


Prerequisites
1. Install CSSCAN (Oracle support note ID 745809.1)
csscan \"sys as sysdba\" full=y
CSS-00107: Character set migration utility schema not installed
Scanner terminated unsuccessfully.

sqlplus / as sysdba
SQL> @?/rdbms/admin/csminst.sql

2. Apply Oracle patch 5879179 (to fix bug 5879179, specific to version 10.2.0.3 64bit)
It should ok for versions higher than 10.2.0.3
Before patching CSSCAN returns multiple errors:
ORA-22925: operation would exceed maximum size allowed for a LOB value

Solution
Character set conversion is done using Oracle utilities CSSCAN and CSALTER.
The main steps are:
1.    Clean the database
2.    CSSCAN
3.    Database backup
4.    Dealing with Truncation and Lossy data
5.    Dealing with Convertible data
6.    CSSCAN
7.    Conversion
8.    Post conversion steps

1. Clean the database. Delete unneeded schemas and tables. Purge database recycle bin.
PURGE DBA_RECYCLEBIN;

2. Run CSSCAN as SYSDBA user.
csscan \"sys as sysdba\" full=y

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

Current database character set is AL32UTF8.

Enter new database character set name: > WE8ISO8859P1

Enter array fetch buffer size: 1024000 >

Enter number of scan processes to utilize(1..32): 1 > 4

Enumerating tables to scan...

The output goes into 3 files: scan.err, scan.out, scan.txt
Detailed description can be found in Oracle Support document "Csscan output explained [ID 444701.1]".

3. Backup database if it has not been done yet.

4. Dealing with Truncation and Lossy data
Check file scan.txt
With some lossy data, I had the following summary:
[Scan Summary]
All character type data in the data dictionary are convertible to the new character set
Some character type application data are not convertible to the new character set

There was no Truncation data in this DB.
Lossy data were found in one table (TOAD_PLAN_TABLE) and I’ve just dropped it.
More info about dealing with Lossy and Transaction data can be found in Oracle document " Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode) [ID 260192.1]"

5. Dealing with Convertible data
"Convertible" Data Dictionary CLOB data is handled by CSALTER. No action required.
"Convertible" Application Data has to be exported and deleted!
When there are application Convertible data without Truncation/Lossy data, scan summary in scan.txt looks like:
[Scan Summary]
All character type data in the data dictionary are convertible to the new character set
All character type application data are convertible to the new character set

In this case there were two application tables with convertible data.
[Application Data Conversion Summary]
Datatype         Changeless  Convertible   Truncation  Lossy
---------- ---------------- ------------ ------------ ------
VARCHAR2     15,045,735,123          280            0      0
CHAR                      0            0            0      0
LONG                      0            0            0      0
CLOB                      0            0            0      0
VARRAY                    0            0            0      0
---------- ---------------- ------------ ------------ ------

[Distribution of Convertible, Truncated and Lossy Data by Table]
USER.TABLE                Convertible  Truncation  Lossy
------------------------ ------------ ----------- ------
EDWSTG_HM.JOBS                     21           0      0
EDWSTG_HM.TRANSACTIONS            259           0      0

Export and truncate
expdp \"/ as sysdba\" DUMPFILE=t01_%u.dmp LOGFILE=t01.log tables=EDWSTG_HM.JOBS,EDWSTG_HM.TRANSACTIONS parallel=4

SQL> truncate table EDWSTG_HM.JOBS;
SQL> truncate table EDWSTG_HM.TRANSACTIONS;

6. Run CSSCAN again.
csscan \"sys as sysdba\" full=y

To run CSALTER the following output of CSSCAN (scan.txt) is required:

[Scan Summary]
All character type data in the data dictionary are convertible to the new character set
All character type application data remain the same in the new character set

[Data Dictionary Conversion Summary]


The data dictionary can be safely migrated using the CSALTER script

7. Conversion
Shutdown listener and make sure nothing else connects to the database.

sqlplus / as sysdba
show parameter job_queue_processes
show parameter aq_tm_processes
-- assuming spfile is in use
alter system set job_queue_processes=0;
alter system set aq_tm_processes=0;
shutdown immediate;

startup restrict
spool convert.log
@?/rdbms/admin/csalter.plb

Restart the database and listener.

8. Post conversion steps
Restore altered parameters
alter system set job_queue_processes=10;
alter system set aq_tm_processes=1;

Import exported tables
impdp \"/ as sysdba\" dumpfile=t01_%u.dmp logfile=t01.imp.log table_exists_action=append parallel=4

Useful Oracle support notes
Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner) [ID 745809.1]
Csscan output explained [ID 444701.1]
Changing the Database Character Set ( NLS_CHARACTERSET ) [ID 225912.1]
Changing the Database Character Set - Frequently Asked Questions [ID 227337.1]