Sunday 6 October 2013

5.3 Teradata Parallel Transporter - Data Connector Operator Example (Producer and Consumer)


Example 1 : Following example uses many operators like DDL,LOAD,EXPORT,DATA CONNECTOR.
But for This lesson we concentrate only on the highlighted part of the script.
In this script we read data from a fixed width input file and load data to a Teradata Table. Next we export from this table and write data to a delimited output file.

DEFINE JOB EXPORT_EMPLOYEE_TABLE_TO_FILE
DESCRIPTION 'EXPORT SAMPLE EMPLOYEE TABLE TO A FILE'
(

/*****************************/
DEFINE SCHEMA EMPLOYEE_SCHEMA
DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
(
EMP_ID   CHAR(10),
EMP_NAME CHAR(10)
);
/*****************************/
--The schema EMPLOYEE_SCHEMA used to read the fixed with input file.
--The DATACONNECTOR PRODUCER operator uses format a 'TEXT' and
--so all fields in the schema should be define as CHAR */

/*****************************/
DEFINE SCHEMA EMPLOYEE_SCHEMA1
DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
(
EMP_ID   VARCHAR(10),
EMP_NAME VARCHAR(10)
);

/*****************************/
--THIS schema is used by Data connector consumer to write to a delimited file .
--When the format is specified as specified as 'DELIMITED' the column should be
--defined as VARCHAR

/*****************************/
DEFINE OPERATOR FILE_READER()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DATA CONNECTOR OPERATOR'
TYPE DATACONNECTOR PRODUCER
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName    = 'file_reader_private.log',
VARCHAR FileName          = 'input_fixed_w_file.txt',
VARCHAR IndicatorMode     = 'N',
VARCHAR OpenMode          = 'Read',
VARCHAR Format            = 'text',
VARCHAR DIrectoryPath = '/home/sukul/tpttest',
VARCHAR ArchiveDirectoryPath = '/home/sukul/tpttest/archive',
INTEGER SkipRows = 1,
VARCHAR SkipRowsEveryFile = 'N'
);
/*****************************/
-- The input file input_fixed_w_file.txt is opened in the read mode indicated by the OpenMOde attribute.
-- The Directory Path in which this file would exist is '/home/sukul/tpttest'.
-- Once the file is read it will be moved to the directory /home/sukul/tpttest/archive indicated by the attribute ArchiveDirectoryPath
-- The attribute Skiprows indicates that the 1st record from the input file will be ignored.
-- Each Attribute must be separated by ,
-- Note that name of the operator has () after the name.
-- We can pass any of these attribute values through job variables files.



/*****************************/
DEFINE OPERATOR LOAD_OPERATOR
TYPE LOAD
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'load_log',
VARCHAR TdpId = 'prod.tdipid',
VARCHAR UserName = 'sukul',
VARCHAR UserPassword = 'mad22dam',
VARCHAR TargetTable = 'SUPPORT_DATABASE.SOURCE_EMP_TABLE',
VARCHAR LogTable = 'SUPPORT_DATABASE.LG_Trans',
VARCHAR ErrorTable1 = 'SUPPORT_DATABASE.ET_Trans',
VARCHAR ErrorTable2 = 'SUPPORT_DATABASE.UV_Trans'
);
/*****************************/

/*****************************/
DEFINE OPERATOR DDL_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DDL OPERATOR'
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = 'DDL_log',
VARCHAR TdpId          = 'prod.tdipid',
VARCHAR UserName       = 'sukul',
VARCHAR UserPassword   = 'mad22dam',
VARCHAR ErrorList      = '3807'
);
/*****************************/


/*****************************/
DEFINE OPERATOR FILE_WRITER()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DATA CONNECTOR OPERATOR'
TYPE DATACONNECTOR CONSUMER
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName    = 'file_writer_privatelog',
VARCHAR FileName          = 'output_delimited_file_list.txt',
VARCHAR IndicatorMode     = 'N',
VARCHAR OpenMode          = 'Write',
VARCHAR Format            = 'delimited',
VARCHAR FILELIST = 'Y',
VARCHAR TEXTDELIMITER = '~~'
);
/*****************************/
-- Filelist = 'Y' indicates that the file output_delimited_file_list.txt is not the actual output file.
-- Instead the output is written to the filename that is written inside the file output_delimited_file_list.txt
-- We can specify only one indirect file because we have used only once instance of consumer operator.
-- The Delimiter in the output file would be 2 bytes.(~~)

/*****************************/
DEFINE OPERATOR EXPORT_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR'
TYPE EXPORT
SCHEMA EMPLOYEE_SCHEMA1
ATTRIBUTES
(
VARCHAR PrivateLogName    = 'export_privatelog',
INTEGER MaxSessions       =  32,
INTEGER MinSessions       =  1,
VARCHAR TdpId             = 'prod.tdipid',
VARCHAR UserName          = 'sukul',
VARCHAR UserPassword      = 'mad22dam',
VARCHAR AccountId,
VARCHAR SelectStmt        = 'SELECT TRIM(CAST(EMP_ID AS VARCHAR(10))) , TRIM(CAST(EMP_NAME AS VARCHAR(10))) FROM SUPPORT_DATABASE.SOURCE_EMP_TABLE;'
);
/*****************************/
-- Note that in the select statement we are converting columns to VARCHAR to make them suitable for writing by the DataConnector consumer operator.


-- Note that we can have multiple steps in a job . Each step has its own closing and opening brackets
STEP setup
(
APPLY
('DROP TABLE SUPPORT_DATABASE.SOURCE_EMP_TABLE;'),
('CREATE TABLE SUPPORT_DATABASE.SOURCE_EMP_TABLE(EMP_ID INTEGER,
EMP_NAME CHAR(10));')
TO OPERATOR (DDL_OPERATOR () );
);
-- Each SQL we write should have its own ;

STEP load_the_file
(
APPLY ('INSERT INTO SUPPORT_DATABASE.SOURCE_EMP_TABLE VALUES (:EMP_ID,:EMP_NAME);') TO OPERATOR (LOAD_OPERATOR())
SELECT * FROM OPERATOR (FILE_READER());
);

STEP export_to_file
(
APPLY TO OPERATOR (FILE_WRITER() )
SELECT * FROM OPERATOR (EXPORT_OPERATOR() [1] );
);
);





Contents of the input file are:
more input_fixed_w_file.txt
0000000001sukul
0000000002uma
0000000033bhanu
0000000004chettri

-- The input file contants 4 records.







Contents of output_delimited_file_list.txt are :
more output_delimited_file_list.txt
/home/sukul/tpttest/outputile.txt

-- This list file contains the name of the actual data file where the records will be written.
-- The actual name of the output file is /home/sukul/tpttest/outputile.txt







We run the this script using the following command:
tbuild -f tpttest2
-- tpttest2 is the name of the file containing the above script.





Following is the log of the above run:


Teradata Parallel Transporter Version 13.10.00.05
Job log: /opt/teradata/client/13.10/tbuild/logs/sukul-13.out
Job id is sukul-13, running on mhp36b21
Teradata Parallel Transporter SQL DDL Operator Version 13.10.00.05
DDL_OPERATOR: private log specified: DDL_log
DDL_OPERATOR: connecting sessions
DDL_OPERATOR: sending SQL requests
DDL_OPERATOR: disconnecting sessions
DDL_OPERATOR: Total processor time used = '0.12 Second(s)'
DDL_OPERATOR: Start : Sun Oct  6 08:34:07 2013
DDL_OPERATOR: End   : Sun Oct  6 08:34:08 2013
Job step setup completed successfully
Teradata Parallel Transporter Load Operator Version 13.10.00.04
LOAD_OPERATOR: private log specified: load_log
LOAD_OPERATOR: connecting sessions
Teradata Parallel Transporter DataConnector Version 13.10.00.05
FILE_READER Instance 1 directing private log report to 'file_reader_private.log-1'.
FILE_READER: TPT19008 DataConnector Producer operator Instances: 1
FILE_READER: TPT19003 ECI operator ID: FILE_READER-15386
FILE_READER: TPT19222 Operator instance 1 processing file '/home/sukul/tpttest/input_fixed_w_file.txt'.
LOAD_OPERATOR: preparing target table
LOAD_OPERATOR: entering Acquisition Phase
LOAD_OPERATOR: entering Application Phase
LOAD_OPERATOR: Statistics for Target Table:  'SUPPORT_DATABASE.SOURCE_EMP_TABLE'
LOAD_OPERATOR: Total Rows Sent To RDBMS:      3
LOAD_OPERATOR: Total Rows Applied:            3
LOAD_OPERATOR: disconnecting sessions
FILE_READER: TPT19221 Total files processed: 1.
LOAD_OPERATOR: Total processor time used = '1.51 Second(s)'
LOAD_OPERATOR: Start : Sun Oct  6 08:34:12 2013
LOAD_OPERATOR: End   : Sun Oct  6 08:34:27 2013
Job step load_the_file completed successfully
Teradata Parallel Transporter DataConnector Version 13.10.00.05
FILE_WRITER Instance 1 directing private log report to 'file_writer_privatelog-1'.
FILE_WRITER: TPT19007 DataConnector Consumer operator Instances: 1
FILE_WRITER: TPT19003 ECI operator ID: FILE_WRITER-15415
FILE_WRITER: TPT19222 Operator instance 1 processing file '/home/sukul/tpttest/outputile.txt'.
Teradata Parallel Transporter Export Operator Version 13.10.00.04
EXPORT_OPERATOR: private log specified: export_privatelog
EXPORT_OPERATOR: connecting sessions
EXPORT_OPERATOR: sending SELECT request
EXPORT_OPERATOR: entering End Export Phase
EXPORT_OPERATOR: Total Rows Exported:  3
FILE_WRITER: TPT19221 Total files processed: 1.
EXPORT_OPERATOR: disconnecting sessions
EXPORT_OPERATOR: Total processor time used = '1.51 Second(s)'
EXPORT_OPERATOR: Start : Sun Oct  6 08:34:29 2013
EXPORT_OPERATOR: End   : Sun Oct  6 08:34:42 2013
Job step export_to_file completed successfully
Job sukul completed successfully


-- In the above log we can see that only 3 records get loaded, 1st record is ignored as we specified Skiprows = 1

Above log shows  that the logfile name is /opt/teradata/client/13.10/tbuild/logs/sukul-13.out

If we try to read this file using the commands like more we get unreadable data as shown below:

TWB_EVENTSM-#^E^E^EsetupM-^@^M-A^B^CRQXM-?M-^X^A8A^EAsm017r-13,17,0,OperatorEnter,setup,DDL_OPERATOR,1,2013-10-06,,1,0M-^X^A^A;M-p^D
        ^DAPPLY_1[0001]opermsgsM-^L^L^E^LDDL_OPERATORM-^\^D^B^AM- ^EM-!^E^E^Esetup`^B     *M-^U^C^CRQXM-?M-^X^BD^G^E^G
SQL DDLL^K^E^K13.10.00.05`^B    *M-^V^C^CRQXM-?M-^X^A^BD^L^E^LDDL_OPERATORQ^G^E^GDDL_logM- ^A^A;M-p^D
^DAPPLY_1[0001]opermsgsM-^L^L^E^LDDL_OPERATORM-^\^D^B^AM- ^G^E^GDDL_logM-(^E^E^Esetup@^B
'u^B^CRQXM-?M-^X^A8^A^E^A ^Ct^B
'^R^B^CRQXM-?M-^X^A^BD^A^E^A F^C!^E^C!   ===================================================================
     =                                                                 =
     =                  TERADATA PARALLEL TRANSPORTER                  =
     =                                                                 =
     =            SQL DDL OPERATOR     VERSION 13.10.00.05             =
     =                                                                 =
     =          OPERATOR SUPPORT LIBRARY VERSION 13.10.00.04           =
     =                                                                 =
     = COPYRIGHT 2001-2010, TERADATA CORPORATION. ALL RIGHTS RESERVED. =
     =                                                                 =
     ===================================================================
t^B
'e^B^CRQXM-?M-^X^B^BD^E**** 08:34:07R^X^E^XSun Oct  6 08:34:07 2013@^B

To view this public log properly we need to use the command tlogview as shown below :

tlogview -l sukul-13.out

This command will show us a formatted log. Note that we use the option '-l' to specify the public log name.













Private log:

In each of the operators we specified private log name.
This private log provides a detailed log specific to the operator.
To find all the private logs in the public log we use the following command.

 tlogview -l sukul-13.out -p

The option -p indicates all the available private logs embedded in the public log.

Following is the output of the above command:


PXCRM

TWB_SRCTGT

TWB_STATUS

TWB_EVENTS

DDL_log

load_log

file_reader_private.log-1

file_writer_privatelog-1

export_privatelog

Note that 1st 4 are system defined private logs and remaining 5 are for each of the 5 operators we specified in our job.

To view either of the private logs we use the -f option of the tlogview command as follows:

tlogview -l sukul-13.out -f file_reader_private.log-1

Following is how the private log of the reader operator looks like:

     ==========================================================================
     =                                                                        =
     =                     TERADATA PARALLEL TRANSPORTER                      =
     =                                                                        =
     =              DATACONNECTOR OPERATOR VERSION  13.10.00.05               =
     =                                                                        =
     =           DataConnector UTILITY LIBRARY VERSION 13.10.00.17            =
     =                                                                        =
     =    COPYRIGHT 2001-2010, Teradata Corporation.  ALL RIGHTS RESERVED.    =
     =                                                                        =
     ======================================================================

     Operator name: 'FILE_READER' instance 1 of 1 [Producer]

**** 08:34:12 Processing starting at: Sun Oct  6 08:34:12 2013

     ======================================================================
     =                                                                        =
     =                    Operator module static specifics                    =
     =                                                                        =
     =                Compiled for platform: '32-bit HPUX-PA'                 =
     =         Operator module name:'dtacop', version:'13.10.00.05C'          =
     =                                                                        =
     =      pmdcomt_HeaderVersion: 'Common 13.10.00.10' - packing 'none'      =
     =      pmddamt_HeaderVersion: 'Common 13.10.00.01' - packing 'none'      =
     =                                                                        =
     ======================================================================

     Log will include stats only

     Operator 'dtacop' main source version:'13.10.00.21'
**** 08:34:12 From file 'input_fixed_w_file.txt', starting to send rows.
**** 08:34:19 Finished sending rows for input_fixed_w_file.txt (index 1)
     Rows sent: 3, (size: 84) CPU Time: 0.00 Seconds
     Sending stat to infrastructure: DCCheckPointNo=0
     Sending stat to infrastructure: DCRowsRead=3
     Sending stat to infrastructure: DCFilesRead=1
     Sending stat to infrastructure: DCRowErrorNo=0
     Sending stat to infrastructure: DCFileName='input_fixed_w_file.txt'
     Files read by this instance: 1
**** 08:34:25 Total processor time used = '0.00 Seconds(s)'
**** 08:34:25 Total files processed: 1








Only 3 records get loaded to the Table. Following are the contents of the table:
select * from SOURCE_EMP_TABLE              ;

EMP_ID        EMP_NAME
1        2        uma      
2        33        bhanu    
3        4        chettri  










The output delimited file is as follows:
more outputile.txt
2~~uma
33~~bhanu
4~~chettri

Note that the delimiter is 2 bytes.



No comments:

Post a Comment