Saturday 21 September 2013

2.2 Sample TPT script and explanation of job sections


Following is an example of a TPT script:

Following are the points to note:

  • The section is red shows optional job header containing comments in C- style. This section can be used to record useful information about who created it and how to use the script.

  • The section in grey is the declarative section of the script uses DEFINE statements to define the Teradata PT Objects(Operators and Schema) needed for the job

  • The section in black is executable section of the script specifies the processing statements that initiate the actions that read/extract, filter, insert, update, and delete data.
        APPLY statements
  1. Specify the operations to be performed.
  2. Operators to be used.
  1. Source and Destination of the data.
  1. Filtering options and the optional degree of parallelism.


/**************************************************************/
/* Execution:                                                 */
/*                                                            */
/* On Unix and Windows, here is the command to execute this   */
/* script:                                                    */
/*                                                            */
/*    tbuild -f GT11.txt -v jobvars.txt                       */
/*                                                            */
/* For the mainframe use the accompanying JCL.                */
/*                                                            */
/**************************************************************/
DEFINE JOB BACKUP_EMPLOYEE_TABLE
DESCRIPTION 'BACKUP SAMPLE EMPLOYEE TABLE'
(
   DEFINE SCHEMA EMPLOYEE_SCHEMA
   DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
   (
      EMP_ID   INTEGER,
      EMP_NAME CHAR(10)
   );

   DEFINE OPERATOR DDL_OPERATOR()
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER DDL OPERATOR'
   TYPE DDL
   ATTRIBUTES
   (
      VARCHAR PrivateLogName = 'GT11_ddloper_log',
      VARCHAR TdpId          = @MyTdpId,
      VARCHAR UserName       = @MyUserName,
      VARCHAR UserPassword   = @MyPassword,
      VARCHAR AccountID,
      VARCHAR ErrorList      = '3807'
   );

   DEFINE OPERATOR LOAD_OPERATOR()
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER LOAD OPERATOR'
   TYPE LOAD
   SCHEMA EMPLOYEE_SCHEMA
   ATTRIBUTES
   (
      VARCHAR PrivateLogName    = 'GT11_loadoper_privatelog',
      INTEGER MaxSessions       =  32,
      INTEGER MinSessions       =  1,
      VARCHAR TargetTable       = 'TARGET_EMP_TABLE',
      VARCHAR TdpId             = @MyTdpId,
      VARCHAR UserName          = @MyUserName,
      VARCHAR UserPassword      = @MyPassword,
      VARCHAR AccountId,
      VARCHAR ErrorTable1       = 'GT11_LOADOPER_ERRTABLE1',
      VARCHAR ErrorTable2       = 'GT11_LOADOPER_ERRTABLE2',
      VARCHAR LogTable          = 'GT11_LOADOPER_LOGTABLE'
   );

   DEFINE OPERATOR EXPORT_OPERATOR()
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR'
   TYPE EXPORT
   SCHEMA EMPLOYEE_SCHEMA
   ATTRIBUTES
   (
      VARCHAR PrivateLogName    = 'GT11_exportoper_privatelog',
      INTEGER MaxSessions       =  32,
      INTEGER MinSessions       =  1,
      VARCHAR TdpId             = @MyTdpId,
      VARCHAR UserName          = @MyUserName,
      VARCHAR UserPassword      = @MyPassword,
      VARCHAR AccountId,
      VARCHAR SelectStmt        = 'SELECT * FROM SOURCE_EMP_TABLE;'
   );

   STEP setup
   (
      APPLY
      ('DROP TABLE SOURCE_EMP_TABLE;'),
      ('DROP TABLE TARGET_EMP_TABLE;'),
      ('DROP TABLE GT11_LOADOPER_ERRTABLE1;'),
      ('DROP TABLE GT11_LOADOPER_ERRTABLE2;'),
      ('DROP TABLE GT11_LOADOPER_LOGTABLE;'),
      ('CREATE TABLE SOURCE_EMP_TABLE(EMP_ID INTEGER,
                                      EMP_NAME CHAR(10));'),
      ('CREATE TABLE TARGET_EMP_TABLE(EMP_ID INTEGER,
                                      EMP_NAME CHAR(10));'),
      ('INSERT INTO SOURCE_EMP_TABLE(1,''JOHN'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(2,''PETER'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(3,''FRANK'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(4,''MARY'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(5,''ELLEN'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(6,''MICHAEL'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(7,''SAM'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(8,''JONATHAN'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(9,''MICHELLE'');'),
      ('INSERT INTO SOURCE_EMP_TABLE(10,''ALICE'');')
      TO OPERATOR (DDL_OPERATOR () );
   );

   STEP load_data
   (
      APPLY
      ('INSERT INTO TARGET_EMP_TABLE (:EMP_ID, :EMP_NAME);')
      TO OPERATOR (LOAD_OPERATOR () [1] )

      SELECT * FROM OPERATOR (EXPORT_OPERATOR() [1] );
   );
);



2.2.1 Object Definitions statements:

In the Declarative section we define all the .objects referenced by the script.
The definition statements include:



  1. DEFINE JOB (Required).

This names the TPT job and providing this name is mandatory .
We should use some descriptive name. Ex: BACKUP_EMPLOYEE_TABLE.

This Name need not be the same as the name of the tpt script. We can choose to keep it same , but it is not necessary.

Note that the jobname shown in the DEFINE JOB statement is not necessarily the same as the “jobname” specified in the tbuild statement when launching the job, although it can be.
Note: tbuild command is used to launch the TPT job.

The DEFINE JOB contains all the other object definitions(schema's and Operators) and also the processing  statements.




  1. DEFINE SCHEMA (Required)

Teradata PT requires that the job script describe the structure of the data to be processed, that
is the columns in table rows or fields in file records.

This description is called the schema.

Schemas are created using the DEFINE SCHEMA statement.

Ex:

   DEFINE SCHEMA EMPLOYEE_SCHEMA
   DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA' --> Small description.
   ( --> Open parentheses.
      EMP_ID   INTEGER,
      EMP_NAME CHAR(10)
   ); --> Note that it ends with a closing bracket and a semi colon.

 DEFINE OPERATOR EXPORT_OPERATOR()
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR'
   TYPE EXPORT
   SCHEMA EMPLOYEE_SCHEMA --> The value following the keyword SCHEMA in a DEFINE OPERATOR statement identifies the schema that the operator will use to process job data.


Each distinct data structure requires a separate DEFINE SCHEMA statement.



2.2.1.B.1 Following are certain guidelines for defining schema:

  • The schema referenced in a producer operator definition describes the structure of the source data.
     
  • The schema referenced in a consumer operator definition describes the structure of the data that will be loaded into the target. The consumer operator schema can be coded as SCHEMA * (a deferred schema), which means that it will accept the scheme of the output data from the producer.
  • You can use the same schema for multiple operators.
     
  • Single Operator can only use one schema unless it is a filter Operator.
     
  • The column names in a schema definition in a Teradata PT script do not have to match the actual column names of the target table, but their data types must match exactly.



2.2.1.B.2 UNION Compatible Schemas:

A single script often requires two schemas, one each for the source and target. It is also
possible to use multiple schemas for the source data if all rows are UNION-compatible.

Two schemas are UNION-compatible if their corresponding columns have exactly the same data type attributes (type, length, precision and scale); that is, other than their column names, the schemas are identical.

If the schemas are UNION-compatible Teradata PT combines data from the sources, each being extracted by a different producer operator using a different schema, into a single output data stream using its UNION ALL feature. (will b explained in detail later on)

Ex:
DEFINE SCHEMA ABSENTEES
DESCRIPTION 'Employees who failed to attend the training session'
(
ABSENTEE_NAME CHAR(24),
EXCUSE VARCHAR(256)
);
DEFINE SCHEMA PRESENTERS
DESCRIPTION 'Employees who gave presentations at the training session'
(
PRESENTER_NAME CHAR(24),
PRESENTATION_TOPIC VARCHAR(128)
);

Both these SCHEMA's are UNION Compatible.






  1. DEFINE OPERATOR(Required)

Defines a Operator and specifies the operator attributes to which values will be assigned.

Choice of the Operator is based on the type of data source, the characteristics of the target tables, and the specific operations to be performed.


   DEFINE OPERATOR EXPORT_OPERATOR() --> EXPORT_OPERATOR is the name of the Operator.
   DESCRIPTION 'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR' --> Optional Description.
   TYPE EXPORT  --> We need to specify the type of the operator
   SCHEMA EMPLOYEE_SCHEMA --> Here we specify the name of the predefined schema Object or make use of SCHEMA * as deferred schema
   ATTRIBUTES
   (
      VARCHAR PrivateLogName    = 'GT11_exportoper_privatelog',
      INTEGER MaxSessions       =  32, --> Note that attributes are separated by comma. All values for mandatory attributes must be provided.
      INTEGER MinSessions       =  1,
      VARCHAR TdpId             = @MyTdpId,
      VARCHAR UserName          = @MyUserName,
      VARCHAR UserPassword      = @MyPassword,
      VARCHAR AccountId,
      VARCHAR SelectStmt        = 'SELECT * FROM SOURCE_EMP_TABLE;'
   );



Producer Operators can read data from TD tables, Other DB tables of files.
Note that for producer operators we need to specify the input schema.
A deferred schema, specified as SCHEMA *, is not supported for producer operators.



A consumer operator “consumes” data from the data stream in order to write it to the Teradata Database, or an external data target, such as a flat file.
With Consumer Operators we can use  SCHEMA * if the input and output schemas are the same.

Teradata PT limits the number of tables consumers can load simultaneously as follows:
  1. LOAD  --> 1 table
  1. UPDATE --> 5 tables
  1. STREAM --> 127 tables
  1. SQL inserter --> 1  table


A Standalone Operators like DDL,OS ,Update Standalone can be used for processing that does not involve sending data to or receiving data from the Teradata PT operator interface, and thus does not use data streams.
It is very imp to note that if standalone is used
  • the only operator used in the script, if job steps are not used
  • the only operator used in the job step, if job steps are used


Attributes for the operator are specified within the ATTRIBUTES section as follows:

   ATTRIBUTES
   (
      VARCHAR PrivateLogName    = 'GT11_exportoper_privatelog',
      INTEGER MaxSessions       =  32,
      INTEGER MinSessions       =  1,
      VARCHAR TdpId             = @MyTdpId,
      VARCHAR UserName          = @MyUserName,
      VARCHAR UserPassword      = @MyPassword,
      VARCHAR AccountId,
      VARCHAR SelectStmt        = 'SELECT * FROM SOURCE_EMP_TABLE;'
   );

Attributes must be declared in the operator definition when:

• they are required by the operator.
• we want to assign a value that is different than the default attribute value.
• we want the option of assigning an overriding value (for either the default or assigned value) in the APPLY statement. Note that even if we want to override values in the APPLY clause we should declare it in the ATTRIBUTES section. Note that in APPLY clause of the executable section we can assign values to the attributes.
If an attribute value is assigned in the operator definition and is also assigned in the APPLY statement, the APPLY value overrides the value in the operator definition. The override value applies only to the occurrence of the operator where the override value is assigned.

The use of VARCHAR and INTEGER is optional when the attribute declaration includes a value assignment, such as for PrivateLogName and MexSessions, above.

For mandatory attributes we should provide values. For Optional attributes we can provide values or choose to  let them take default values.



2.2.1.C.1 Multivalued(Array Type) Attributes:

In TPT we can have multiple values for a few operator attributes. 

Array Attributes values can be specified as the part of :
  • DEFINE OPERATOR
  • Reference to an operator in an APPLY Statement.

Available array attributes are shown in the following table:





Example of how an array value for an attribute would appear in a DEFINE OPERATOR statement or an APPLY statement:

VARCHAR ARRAY TraceLevel = [ 'CLI', 'OPER' ]
VARCHAR TraceLevel = [ 'CLI', 'OPER' ]
TraceLevel = [ 'CLI', 'OPER' ]

Array values can be assigned in a series as shown in the following examples:

VARCHAR ARRAY TargetTable = ['table1', 'table2', …, 'tableN']
VARCHAR TargetTable = ['table1', 'table2', …, 'tableN']
TargetTable = ['table1', 'table2', …, 'tableN']

Using the ARRAY keyword in assigning an array value is optional.

An array value containing a single member (for example, ['x'] or [2]) is still considered a valid array value. In this case, the array dimension is 1. However, even this single value must be specified through array notation, that is to say, enclosed in [ ].


To omit some of the values for the array attribute, for example, to assign a first and a third value, but not the second, you can do the following: specify a value of NULL.
Ex:

• specify a value of NULL, as follows:
VARCHAR FILE_NAMES = ['/first', NULL, '/third']

• specify the omitted value with commas, as follows:
VARCHAR FILE_NAMES = ['/first', , '/third']

• Following example shows an array attribute value assigned as part of a SELECT statement:

SELECT * FROM OPERATOR (reader ATTR (FILE_NAMES = ['/first', NULL,'/ third'], MODE = 'read')



When you declare an operator attribute in an operator definition, you have the option of delay
its value assignment until you run the job script . This is done using job variables.


Object attribute values can be assigned atseveral locations within the job script.
The following list shows the locations where attribute values can be assigned, in the order they
are processed, from first to last. The last value processed is used in the job.
1 DEFINE OPERATOR
2 As part of an APPLY TO...SELECT FROM statement
So if the value is assigned in the DEFINE OPERATOR and also in APPLY clause then the one in APPLY takes effect.






2.2.2. Processing Statements:

The executable section of the job contains the APPLY statement that specify the operation to be performed by the job and the objects that will perform them.

Processing statements include following:

  1. APPLY.. TO

This specifies the:
  • Operators that will be used for load and update of data.
  • Number of instances of the operators.
  • Operator attribute values (optional. Normally we provide values for all the attributes like TDPID etc in the operator definition. But we can choose  to assign or override the values during the APPLY clause)

The APPLY statement may also include data transformations by including filter operators or through the use of derived columns in its SELECT FROM.

A job script must always contain at least one APPLY statement, and if the job contains multiple steps, each step must have an APPLY statement.


An APPLY statement typically contains two parts, which must appear in the order shown:
  1. A DML statement (such as INSERT, UPDATE, or DELETE) that is applied TO the consumer operator that will write the data to the target.

  1. For most jobs, the APPLY statement also includes the read activity, which uses a SELECT
FROM statement to reference the producer operator. If the APPLY statement uses a
standalone operator, it does not need the SELECT FROM statement.




2.2.2.A.1 Using DDL operator in APPLY statement:

The DDL operator can be specified in the APPLY statement in either single or multi-statement
format.
To execute each statement as its own transaction, you should have one SQL statement
per DML group (enclosed in parentheses).
If more than one statement is specified in a DML group, the operator combines them all into a
single multi-statement request and sends it tothe Teradata Database as one transaction.

Teradata Database enforces the rule that a multi-statement DML group can have only one
DDL statement and it must be the last statement in the transaction, which means the last
statement in the group.

If any statement in the group fails, then all statements in
that group are rolled back and no more groups are processed.


The following is a simplified example of a DDL operator in a single-statement format:


APPLY
'SQL statement1',
'SQL statement2',
.........
'SQL statementN'
TO OPERATOR (operator_specifications)

The following is a simplified example of a DDL operator in a multi-statement format:

APPLY
('SQL statement1a', 'SQL statement1b', .........),
('SQL statement2a', 'SQL statement2b', .........),
.........
('SQL statementNa', 'SQL statementNb', .........)
TO OPERATOR (operator_specification)


2.2.2.A. 2 Using UPDATE operator as stand alone:


Update operator can be used to deleting data from TD database.

The Update operator functions as either a standalone or a consumer operator, depending on whether or not data is required to complete the deletion.

Consider the following rules when using the DeleteTask feature:
• The Delete Task feature may not be used on a database view.
• Only one special session will be connected.
• Only one instance may be specified.
• Only one DML group may be specified.
• Only one DML DELETE statement inthe DML group may be specified.
• Only one target table may be specified.
• The first of the error tables (the acquisition error table) is not used and is ignored.
• Only one data record may be provided if using a WHERE clause.



  1. SELECT .. FROM

This specifies the
  • Operators that are used to acquire of filter data.
  • Number of instances of the operators.
  • Selected columns to be sent to the consumer operator.
  • Operator attribute values (optional)


  1. WHERE clauses, CASE DML or CASE value expressions and SELECT derived column values to filter data between source and destination





No comments:

Post a Comment