Showing posts with label ETL. Show all posts
Showing posts with label ETL. Show all posts

2016-08-10

QUERY parameter in Oracle datapump import

Environment

Oracle 11.2.0.4

Symptoms

Using parameters
TABLES= CDR.F_CLAIM_WORK_CAPACITY
QUERY="where claim_ID in (select claim_id from Z.E_ROLLING_CLAIM )"

output
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
UDI-00014: invalid value for parameter, 'query'

I believe Oracle cannot identify a table of the first claim_ID.

Solutions


Syntax
QUERY = [schema.][table_name:] query_clause

Using [schema.][table_name:] for the same query
TABLES= CDR.F_CLAIM_WORK_CAPACITY
QUERY=CDR.F_CLAIM_WORK_CAPACITY:"where claim_ID in (select claim_id from Z.E_ROLLING_CLAIM )"

that works (the schema was remapped from CDR to Z)
. . imported "Z"."F_CLAIM_WORK_CAPACITY"    9.391 MB  154122 out of 157095 rows

The table alias used by Data Pump for the table being unloaded is KU$. Using KU$.
TABLES= CDR.F_CLAIM_WORK_CAPACITY
QUERY="where KU$.claim_ID in (select claim_id from Z.E_ROLLING_CLAIM )"
works
. . imported "Z"."F_CLAIM_WORK_CAPACITY"     9.391 MB  154122 out of 157095 rows

It works for multiple tables too, elapsed 0 00:17:43
TABLES= CDR.F_CLAIM_WORK_CAPACITY,CDR.K_CLAIM_KEY_FIGURES
. . imported "Z"."K_CLAIM_KEY_FIGURES"     4.088 GB 36215408 out of 41843789 rows
. . imported "Z"."F_CLAIM_WORK_CAPACITY"   9.391 MB  154122 out of 157095 rows

The following two parameters do not work and return the same error:
UDI-00014: invalid value for parameter, 'query'

QUERY="where exists (select 1 from Z.E_ROLLING_CLAIM rc where rc.claim_id=ku$.claim_ID )"
QUERY="where exists (select claim_ID from Z.E_ROLLING_CLAIM rc where rc.claim_id=ku$.claim_ID )"
I understand that "select claim_ID" is ambiguously defined, but I expected "select 1" would work.
Anyway the following works fine.
QUERY="where exists (select rc.claim_ID from Z.E_ROLLING_CLAIM rc where rc.claim_id=ku$.claim_ID )"

. . imported "Z"."K_CLAIM_KEY_FIGURES"    4.088 GB 36215408 out of 41843789 rows
. . imported "Z"."F_CLAIM_WORK_CAPACITY"  9.391 MB  154122 out of 157095 rows

and it is a bit faster (elapsed 0 00:12:27) than "where claim_id in ()". Maybe just because of cache :-)


2014-09-05

Configuring Oracle Database 11g Gateway for ODBC MS SQL Server

Environment

Oracle database 11.2
Oracle Linux Server release 5.7
Microsoft® ODBC Driver 11 for SQL Server - RedHat Linux
Microsoft SQL Server 2005,2008R2

Installing ODBC driver

Install the driver according to MS http://www.microsoft.com/en-us/download/details.aspx?id=36437

To verify that the ODBC Driver on Linux was registered successfully, execute the following command:
odbcinst -q -d -n "ODBC Driver 11 for SQL Server"

edit ~/.odbc.ini
and add

[infraUAT]
Driver=ODBC Driver 11 for SQL Server
Description=My Sample ODBC Database Connection
Trace=Yes
Server=gtpsql2
Port=1433
Database=InfraEnt_ITSD_UAT

"Driver" has to be same as output of odbcinst -q -d, which comes from /etc/odbcinst.ini

Test it
odbcinst -q -s
[infraUAT]

isql -v infraUAT user password
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> quit

Configuring Gateway

cd $ORACLE_HOME/hs/admin
If you use initdg4odbc.ora, then dg4odbc will be SID.
edit initdg4odbc.ora
S_FDS_CONNECT_INFO = infraUAT
HS_FDS_SHAREABLE_NAME = /usr/lib64/libodbc.so
# you need to match HS_LANGUAGE to SQLS
HS_LANGUAGE=AMERICAN_AMERICA.WE8MSWIN1252
# you might need the following HS_
#HS_KEEP_REMOTE_COLUMN_SIZE=LOCAL
#HS_NLS_LENGTH_SEMANTICS=CHAR
#nvarchars of a SQL Server are UCS2 character set
#HS_NLS_NCHAR=UCS2
# ODBC specific environment variables
set ODBCINI=/home/oracle/.odbc.ini

HS_LANGUAGE needs to match code page of SQL Server.
When Oracle use unicode AL32UTF8, it will fail to connect. See Doc ID 756186.1
ERROR at line 1:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[

edit listener.ora and add new SID_DESC, where SID matches initdg4odbc.ora
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
        (SID_NAME = dg4odbc)
        (ORACLE_HOME = /opt/oracle/product/se1/11.2.0.3)
        (PROGRAM = dg4odbc)
        (ENVS='LD_LIBRARY_PATH=/usr/lib64:/opt/microsoft/msodbcsql/lib64:/opt/oracle/product/se1/11.2.0.3/lib')
    )
  )
Restart listener
configure  tnsnames.ora
HS1 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = tcp)(HOST = )(PORT = 1521))
    )
    (CONNECT_DATA =
      (SID = dg4odbc)
    )
    (HS=OK)
  )

Configure DB Link
CREATE PUBLIC DATABASE LINK INFRA
 CONNECT TO DSUSER
 IDENTIFIED BY password
 USING 'hs1';

Problem 1 - code page conversion

Some symbols are not converted correctly, e.g. Microsoft double quote or single quote. I have not tested non-latin alphabets.
SELECT short_problem  FROM CL_CALL_LOGGING@infra where call_number=4234;
output
The printer on Level 5 is showing ‘perform printer maintenance’. 
Solution 1
as Oracle NLS_CHARACTERSET = AL32UTF8 (check view NLS_DATABASE_PARAMETERS), create a view in SQLS to CAST this column to unicode, then you can select from this view from Oracle.
CREATE VIEW TEST_VIEW1 AS
SELECT CALL_NUMBER, CAST(SHORT_PROBLEM AS NVARCHAR(200)) AS SHORT_PROBLEM
FROM DBO.CL_CALL_LOGGING;

in Oracle
SELECT CALL_NUMBER, SHORT_PROBLEM FROM TEST_VIEW1@INFRA WHERE CALL_NUMBER=4234;
output
The printer on Level 5 is showingperform printer maintenance.

Problem 2 - multibyte characters

ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Microsoft][ODBC Driver 11 for SQL Server]String data, right truncation {01004}
ORA-02063: preceding 2 lines from INFRA

If in SQLS a column is described as VARCHAR(200) and a value length is 200 and it contains some symbols which become multi-byte, then in Oracle it becomes longer than 200 bytes.
Solution 2
The same as Solution 1, CAST it to unicode on SQLS.
If you create similar or the same table in Oracle, define length in chars, e.g. VARCHAR(200 char)
It does not have to be NVARCHAR as in my case oracle DB is already UTF8.

Problem 3 - varchar(max)

When selecting varchar(max) from SQLS:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Microsoft][ODBC Driver 11 for SQL Server]String data, right truncation {01004}
ORA-02063: preceding 2 lines from INFRA
Solution 3.1
I haven't found a good solution for this. You can CAST(COLNAME AS TEXT) or as NTEXT in SQLS, in thin case you receive LONG type in Oracle (can be saved and converted to CLOB). Which is ok for most values (Latin characters), but it still has wrong conversion for multi-byte/non-latin characters.
Solution 3.2 - other ODBC drivers
try other drivers, e.g. Easysoft or Datadirect. I haven't tested it.
Solution 3.3 - non Oracle ETL
IBM Datastage works ok, when selecting as unicode CAST(col as NVARCHAR(max)) and keeping as LongNVarchar in DataStage to avoid Datastage conversion. It inserts into Oracle CLOB and looks ok.
Microsoft SSIS converts correctly, if destination table in Oracle is defined as NCLOB.

Documentation

Error Ora-28500 and Sqlstate I Issuing Selects From a Unicode Oracle RDBMS With Dg4odbc To Non-Oracle Databases Using the UnixODBC Driver Manager (Doc ID 756186.1)
How to Configure DG4ODBC on 64bit Unix OS (Linux, Solaris, AIX, HP-UX Itanium) to Connect to Non-Oracle Databases Post Install (Doc ID 561033.1)

2013-12-12

ETL, bulk load to Oracle DB guidelines

Environment

Oracle 11.2, Infosphere DataStage 8.7

Steps briefly

Disable constraints
Truncate target table
Disable indexes
Run bulk upload ETL job
Rebuild indexes
Create unique indexes
Enable constraints
Gather optimizer statistics

Details

Disable R constraints

Disable U,P constraints

-          It drops UNIQUE index if it was created by ALTER TABLE ADD CONSTRAINT statement (use KEEP|DROP INDEX to change the behavior)
-          Other indexes remain

Truncate target table

It validates unusable indexes!

Make indexes UNUSABLE

-          UNUSABLE UNIQUE index does not allow DML. It has to be dropped.
ORA-26026: unique index BID.TEST1_I1 initially in unusable state (Direct-path Insert)
ORA-01502: index 'BID.TEST1_I1' or partition of such index is in unusable state (Regular Insert)

-          U,P constraints can also be policed by non-unique indexes

Run bulk upload ETL job

DataStage bulk load is seen as /*+ SYS_DL_CURSOR */ (OCI direct path load).
Target table needs to be NOLOGGING to reduce REDO (unrecoverable operation!)
For direct-path INSERT e.g. /*+ APPEND */,  /*+ SYS_DL_CURSOR */
table mode
DB log_mode
REDO
recovery
logging
noarchivelog
no redo
PITR before insert
nologging
noarchivelog
no redo
PITR before insert
logging
archivelog
redo
to the failure
nologging
archivelog
no redo
PITR before insert

Other DML always generate REDO.
If you do direct-path insert into table with indexes, index update will generate redo whether it is LOGGING or NOLOGGING. (and index is recoverable)

Rebuild indexes

Use NOLOGGING, PARALLEL to speed up. Unrecoverable operation!
After DB/TS recovery you might have:
SELECT COUNT(1) FROM BID.S_CLAIM_STATS_YEARLY WHERE INDUSTRY_ID=18005047
*
Error at line 0
ORA-01578: ORACLE data block corrupted (file # 73, block # 10488)
ORA-01110: data file 73: '/data001/oradata/BIDDEV/biddev_bid_data_test.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option

Attempt to rebuild index returns the same errors. (ALTER INDEX BID.CLMSTSYR_IND_FK REBUILD;)
Index has to be (dropped and recreated) or (made unusable and then rebuild).

Create unique indexes

If it is not part of constraint or you can leave it for CONSTRAINT ENABLE statements

Enable constraints R,U,P

If there are no proper indexes for U,P constrains, CONSTRAINT ENABLE statement will create the index. By default it will be LOGGING in DEFAULT tablespace.  Use USING INDEX to change it (you don’t have to specify index name), e.g.
ALTER TABLE BID.TEST1 ENABLE CONSTRAINT TEST1_U USING INDEX NOLOGGING TABLESPACE BID_DATA_TEST;

Gather optimizer statistics

Gather statistics for the table. Indexes can be excluded if they were rebuilt (CASCADE=FALSE) as Oracle automatically collects statistics during index creation and rebuild.
Alternatively statistics for table+indexes can be gathered before rebuilding indexes, especially if not all indexes are disabled or dropped.

Documentation

Direct Path Loading
INSERT statement
logging_clause