Following
is an example of a TPT script:
Following
are the points to note:
- The section is red shows optional job header containing comments in C- style. This section can be used to record useful information about who created it and how to use the script.
- The section in grey is the declarative section of the script uses DEFINE statements to define the Teradata PT Objects(Operators and Schema) needed for the job
- The section in black is executable section of the script specifies the processing statements that initiate the actions that read/extract, filter, insert, update, and delete data.
APPLY statements
- Specify the operations to be performed.
- Operators to be used.
- Source and Destination of the data.
- Filtering options and the optional degree of parallelism.
/**************************************************************/
/* Execution:
*/
/*
*/
/* On Unix and Windows, here is the command to execute
this */
/* script:
*/
/*
*/
/* tbuild -f
GT11.txt -v jobvars.txt
*/
/*
*/
/* For the mainframe use the accompanying JCL. */
/*
*/
/**************************************************************/
DEFINE
JOB BACKUP_EMPLOYEE_TABLE
DESCRIPTION
'BACKUP SAMPLE EMPLOYEE TABLE'
(
DEFINE SCHEMA EMPLOYEE_SCHEMA
DESCRIPTION 'SAMPLE EMPLOYEE SCHEMA'
(
EMP_ID
INTEGER,
EMP_NAME CHAR(10)
);
DEFINE OPERATOR DDL_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER
DDL OPERATOR'
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName =
'GT11_ddloper_log',
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountID,
VARCHAR ErrorList = '3807'
);
DEFINE OPERATOR LOAD_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER
LOAD OPERATOR'
TYPE LOAD
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT11_loadoper_privatelog',
INTEGER MaxSessions =
32,
INTEGER MinSessions =
1,
VARCHAR TargetTable = 'TARGET_EMP_TABLE',
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR ErrorTable1 = 'GT11_LOADOPER_ERRTABLE1',
VARCHAR ErrorTable2 = 'GT11_LOADOPER_ERRTABLE2',
VARCHAR LogTable = 'GT11_LOADOPER_LOGTABLE'
);
DEFINE OPERATOR EXPORT_OPERATOR()
DESCRIPTION 'TERADATA PARALLEL TRANSPORTER
EXPORT OPERATOR'
TYPE EXPORT
SCHEMA EMPLOYEE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'GT11_exportoper_privatelog',
INTEGER MaxSessions =
32,
INTEGER MinSessions =
1,
VARCHAR TdpId = @MyTdpId,
VARCHAR UserName = @MyUserName,
VARCHAR UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR SelectStmt = 'SELECT * FROM SOURCE_EMP_TABLE;'
);
STEP setup
(
APPLY
('DROP TABLE SOURCE_EMP_TABLE;'),
('DROP TABLE TARGET_EMP_TABLE;'),
('DROP TABLE GT11_LOADOPER_ERRTABLE1;'),
('DROP TABLE GT11_LOADOPER_ERRTABLE2;'),
('DROP TABLE GT11_LOADOPER_LOGTABLE;'),
('CREATE TABLE SOURCE_EMP_TABLE(EMP_ID
INTEGER,
EMP_NAME
CHAR(10));'),
('CREATE TABLE TARGET_EMP_TABLE(EMP_ID
INTEGER,
EMP_NAME
CHAR(10));'),
('INSERT INTO
SOURCE_EMP_TABLE(1,''JOHN'');'),
('INSERT INTO
SOURCE_EMP_TABLE(2,''PETER'');'),
('INSERT INTO
SOURCE_EMP_TABLE(3,''FRANK'');'),
('INSERT INTO
SOURCE_EMP_TABLE(4,''MARY'');'),
('INSERT INTO
SOURCE_EMP_TABLE(5,''ELLEN'');'),
('INSERT INTO
SOURCE_EMP_TABLE(6,''MICHAEL'');'),
('INSERT INTO
SOURCE_EMP_TABLE(7,''SAM'');'),
('INSERT INTO
SOURCE_EMP_TABLE(8,''JONATHAN'');'),
('INSERT INTO
SOURCE_EMP_TABLE(9,''MICHELLE'');'),
('INSERT INTO
SOURCE_EMP_TABLE(10,''ALICE'');')
TO OPERATOR (DDL_OPERATOR () );
);
STEP load_data
(
APPLY
('INSERT INTO TARGET_EMP_TABLE (:EMP_ID,
:EMP_NAME);')
TO OPERATOR (LOAD_OPERATOR () [1] )
SELECT * FROM OPERATOR (EXPORT_OPERATOR()
[1] );
);
);
2.2.1 Object Definitions
statements:
In the
Declarative section we define all the .objects referenced by the script.
The
definition statements include:
- DEFINE JOB (Required).
This names the TPT job and providing this name is mandatory .
We should use some descriptive name. Ex: BACKUP_EMPLOYEE_TABLE.
This Name need not be the same as the name of the tpt script. We
can choose to keep it same , but it is not necessary.
Note that the jobname shown in the DEFINE JOB statement is not
necessarily the same as the “jobname” specified in the tbuild statement when
launching the job, although it can be.
Note: tbuild command is used to launch the TPT job.
The DEFINE JOB contains all the other object
definitions(schema's and Operators) and also the processing statements.
- DEFINE SCHEMA (Required)
Teradata PT requires that the job script describe the structure
of the data to be processed, that
is the columns in table rows or fields in file records.
This description is called the schema.
Schemas are created using the DEFINE SCHEMA statement.
Ex:
DEFINE SCHEMA
EMPLOYEE_SCHEMA
DESCRIPTION
'SAMPLE EMPLOYEE SCHEMA' --> Small description.
( --> Open parentheses.
EMP_ID INTEGER,
EMP_NAME CHAR(10)
); --> Note that
it ends with a closing bracket and a semi colon.
DEFINE OPERATOR
EXPORT_OPERATOR()
DESCRIPTION 'TERADATA
PARALLEL TRANSPORTER EXPORT OPERATOR'
TYPE EXPORT
SCHEMA EMPLOYEE_SCHEMA --> The value following the keyword SCHEMA in a DEFINE OPERATOR
statement identifies the schema that the operator will use to process job data.
Each distinct data structure requires a separate DEFINE SCHEMA
statement.
2.2.1.B.1 Following are
certain guidelines for defining schema:
- The schema
referenced in a producer operator definition describes the structure
of the source data.
- The schema referenced in a consumer operator definition describes the structure of the data that will be loaded into the target. The consumer operator schema can be coded as SCHEMA * (a deferred schema), which means that it will accept the scheme of the output data from the producer.
- You can use the same
schema for multiple operators.
- Single
Operator can only use one schema unless it is a filter Operator.
- The column names in a schema definition in a Teradata PT script do not have to match the actual column names of the target table, but their data types must match exactly.
2.2.1.B.2 UNION Compatible Schemas:
A single script often requires two schemas, one each for the
source and target. It is also
possible to use multiple schemas for the source data if all rows
are UNION-compatible.
Two schemas are UNION-compatible
if their corresponding columns have exactly the same data type attributes (type, length,
precision and scale); that is, other than their column names, the schemas are identical.
If the schemas are UNION-compatible Teradata PT combines data from the sources, each being extracted by a different producer
operator using a different schema, into a single output data stream using its UNION ALL
feature. (will b explained in detail later on)
Ex:
DEFINE SCHEMA ABSENTEES
DESCRIPTION 'Employees who failed to attend the training
session'
(
ABSENTEE_NAME CHAR(24),
EXCUSE VARCHAR(256)
);
DEFINE SCHEMA PRESENTERS
DESCRIPTION 'Employees who gave presentations at the training
session'
(
PRESENTER_NAME CHAR(24),
PRESENTATION_TOPIC VARCHAR(128)
);
Both these SCHEMA's are UNION Compatible.
- DEFINE OPERATOR(Required)
Defines a Operator and specifies the
operator attributes to which values will be
assigned.
Choice of the Operator is based on the type of data source, the
characteristics of the target tables, and the specific operations to be
performed.
DEFINE OPERATOR
EXPORT_OPERATOR() -->
EXPORT_OPERATOR is the name of the Operator.
DESCRIPTION
'TERADATA PARALLEL TRANSPORTER EXPORT OPERATOR' --> Optional Description.
TYPE EXPORT --> We need to specify the type of the operator
SCHEMA
EMPLOYEE_SCHEMA --> Here
we specify the name of the predefined schema Object or make use of SCHEMA * as
deferred schema
ATTRIBUTES
(
VARCHAR
PrivateLogName =
'GT11_exportoper_privatelog',
INTEGER
MaxSessions = 32, --> Note that attributes are separated by comma. All values
for mandatory attributes must be provided.
INTEGER
MinSessions = 1,
VARCHAR TdpId = @MyTdpId,
VARCHAR
UserName = @MyUserName,
VARCHAR
UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR
SelectStmt = 'SELECT * FROM
SOURCE_EMP_TABLE;'
);
Producer Operators can
read data from TD tables, Other DB tables of files.
Note that for producer operators we need to specify the input
schema.
A deferred schema, specified as SCHEMA *, is not supported for
producer operators.
A consumer operator
“consumes” data from the data stream in order to write it to the Teradata
Database, or an external data target, such as a flat file.
With Consumer Operators we can use SCHEMA * if the input and output schemas are
the same.
Teradata PT limits the number of tables consumers can load
simultaneously as follows:
- LOAD --> 1 table
- UPDATE --> 5 tables
- STREAM --> 127 tables
- SQL inserter --> 1 table
A Standalone Operators
like DDL,OS ,Update Standalone can be used for processing that does not involve
sending data to or receiving data from the Teradata PT operator interface, and
thus does not use data streams.
It is very imp to note that if standalone is used
- the only operator used in the script, if job steps are not used
- the only operator used in the job step, if job steps are used
Attributes for the operator are specified within the ATTRIBUTES
section as follows:
ATTRIBUTES
(
VARCHAR
PrivateLogName =
'GT11_exportoper_privatelog',
INTEGER
MaxSessions = 32,
INTEGER
MinSessions = 1,
VARCHAR TdpId = @MyTdpId,
VARCHAR
UserName = @MyUserName,
VARCHAR
UserPassword = @MyPassword,
VARCHAR AccountId,
VARCHAR
SelectStmt = 'SELECT * FROM
SOURCE_EMP_TABLE;'
);
Attributes must be declared in the operator definition when:
• they are required by the operator.
• we want to assign a value that is different than the default
attribute value.
• we want the option of assigning an overriding value
(for either the default or assigned value) in the APPLY statement. Note that
even if we want to override values in the APPLY clause we should declare it in
the ATTRIBUTES section. Note
that in APPLY clause of the executable section we can assign values to the
attributes.
If an attribute value is assigned
in the operator definition and is also assigned in the APPLY statement, the
APPLY value overrides the value in the operator definition. The override value
applies only to the occurrence of the operator where the override value is
assigned.
The use of VARCHAR and INTEGER is optional when the attribute
declaration includes a value assignment, such as for PrivateLogName and
MexSessions, above.
For mandatory attributes we should provide values. For Optional
attributes we can provide values or choose to
let them take default values.
2.2.1.C.1 Multivalued(Array Type) Attributes:
In TPT we can have multiple values for a few operator
attributes.
Array Attributes values can be specified as the part of :
- DEFINE OPERATOR
- Reference to an operator in an APPLY Statement.
Available array attributes are shown in the following table:
Example of how an array value for an attribute would appear in a
DEFINE OPERATOR statement or an APPLY statement:
VARCHAR ARRAY TraceLevel = [ 'CLI', 'OPER' ]
VARCHAR TraceLevel = [ 'CLI', 'OPER' ]
TraceLevel = [ 'CLI', 'OPER' ]
Array values can be assigned in a series as shown in the
following examples:
VARCHAR ARRAY TargetTable = ['table1', 'table2', …, 'tableN']
VARCHAR TargetTable = ['table1', 'table2', …, 'tableN']
TargetTable = ['table1', 'table2', …, 'tableN']
Using the ARRAY keyword in
assigning an array value is optional.
An array value containing a single member (for example,
['x'] or [2]) is still considered a valid array value. In this case, the array
dimension is 1. However,
even this single value must be specified through array notation, that is to
say, enclosed in [ ].
To omit some of the values for the array attribute, for example,
to assign a first and a third value, but not the second, you can do the
following: specify a value of NULL.
Ex:
• specify a value of NULL, as follows:
VARCHAR FILE_NAMES = ['/first', NULL, '/third']
• specify the omitted value with commas, as follows:
VARCHAR FILE_NAMES = ['/first', , '/third']
• Following example shows an array attribute value assigned as
part of a SELECT statement:
SELECT * FROM OPERATOR (reader ATTR (FILE_NAMES = ['/first',
NULL,'/ third'], MODE = 'read')
When you declare an operator attribute in an operator
definition, you have the option of delay
its value assignment until you run the job script . This
is done using job
variables.
Object attribute values can be assigned atseveral locations
within the job script.
The following list shows the locations where attribute values
can be assigned, in the order they
are processed, from first to last. The last value processed is used in the job.
1 DEFINE OPERATOR
2 As part of an APPLY TO...SELECT FROM statement
So if the value is assigned in the DEFINE OPERATOR and also in
APPLY clause then the one in APPLY takes effect.
2.2.2. Processing
Statements:
The
executable section of the job contains the APPLY statement that specify the
operation to be performed by the job and the objects that will perform them.
Processing
statements include following:
- APPLY.. TO
This specifies the:
- Operators that will be used for load and update of data.
- Number of instances of the operators.
- Operator attribute values (optional. Normally we provide values for all the attributes like TDPID etc in the operator definition. But we can choose to assign or override the values during the APPLY clause)
The APPLY statement may also include data transformations by
including filter operators or through the use of derived columns in its SELECT
FROM.
A job script must always contain at least one APPLY statement, and if the job
contains multiple steps, each step must have an APPLY statement.
An APPLY statement typically contains two parts, which must
appear in the order shown:
- A DML statement (such as INSERT, UPDATE, or DELETE) that is applied TO the consumer operator that will write the data to the target.
- For most jobs, the APPLY statement also includes the read activity, which uses a SELECT
FROM statement to reference the producer operator. If the APPLY
statement uses a
standalone operator, it does not need the SELECT FROM statement.
2.2.2.A.1 Using DDL operator in APPLY statement:
The DDL operator can be specified in the APPLY statement in
either single or multi-statement
format.
To execute each statement as its
own transaction, you should have one SQL statement
per DML group (enclosed in
parentheses).
If more than one statement is
specified in a DML group, the operator combines them all into a
single multi-statement request
and sends it tothe Teradata Database as one transaction.
Teradata Database enforces the
rule that a multi-statement DML group can have only one
DDL statement and it must be the
last statement in the transaction, which means the last
statement in the group.
If any statement in the group fails, then all statements in
that group are rolled back and no more groups are processed.
The following is a simplified example of a DDL operator in a single-statement format:
APPLY
'SQL statement1',
'SQL statement2',
.........
'SQL statementN'
TO OPERATOR (operator_specifications)
The following is a simplified example of a DDL operator in a multi-statement format:
APPLY
('SQL statement1a', 'SQL statement1b', .........),
('SQL statement2a', 'SQL statement2b', .........),
.........
('SQL statementNa', 'SQL statementNb', .........)
TO OPERATOR (operator_specification)
2.2.2.A. 2 Using UPDATE
operator as stand alone:
Update operator can be used to deleting data from TD database.
The Update operator functions as either a standalone or a
consumer operator, depending on whether or not data is required to complete the
deletion.
Consider the following rules when using the DeleteTask feature:
• The Delete Task feature may not be used on a database view.
• Only one special session will be connected.
• Only one instance may be specified.
• Only one DML group may be specified.
• Only one DML DELETE statement inthe DML group may be
specified.
• Only one target table may be specified.
• The first of the error tables (the acquisition error table) is
not used and is ignored.
• Only one data record may be provided if using a WHERE clause.
- SELECT .. FROM
This specifies the
- Operators that are used to acquire of filter data.
- Number of instances of the operators.
- Selected columns to be sent to the consumer operator.
- Operator attribute values (optional)
- WHERE clauses, CASE DML or CASE value expressions and SELECT derived column values to filter data between source and destination
No comments:
Post a Comment