Load Operator Capabilities:
- Load Operator is a consumer operator.
- Load Operator uses Teradata FastLoad protocol to load a large volume of data at high speed into an empty table in Teradata Database.
Load Operation Phases:
Following
are the phases involved in loading the Load process.
Acquisition
Phase:
|
Data from the input stream is transmitted to the AMPs.
The AMPs 'acquire' the data.
The acquisition phase is not complete until all data sources are exhausted and all data rows are on the appropriate AMPs. The data is not yet sorted or blocked, and therefore, is not yet accessible. |
Application
Phase
|
Data is sorted,
blocked, and put into its final format.
All activity in this phase is
AMP-local.
Data is accessible after the
completion of this phase.
|
Load Operator Definition:
Following
is an example of a Load operator definition.
DEFINE
OPERATOR LOADOP
DESCRIPTION
'Teradata PT LOAD OPERATOR'
TYPE
LOAD
SCHEMA
INPUT_SCHEMA
ATTRIBUTES
(
VARCHAR
TdpId,
VARCHAR
UserName,
VARCHAR
UserPassword,
VARCHAR
TargetTable,
VARCHAR
LogTable,
VARCHAR
ErrorTable1,
VARCHAR
ErrorTable2,
VARCHAR
PrivateLogName
);
TDPID :
|
It’s
a optional Security attribute that specifies the name of the Teradata
database machine(non-mainframe platforms) or TDP(mainframe platforms) for
Export operator job.
If we
do not specify the value for the TdpId attribute, the operator uses the
default TdpID established for the user by the system administrator.
|
UserName:
|
Attribute
that specifies the user name on the Teradata Database.
|
UserPassword :
|
Attribute
that specifies the password associated with the user name.
|
PrivateLogName
|
Optional
attribute that specifies the name of a log that is maintained by the Teradata
PT
Logger
inside the public log. The private log contains all of the messages produced
by the operator.
|
TargetTable
|
It’s
the target table to be loaded.
The
table should be empty and without any secondary or join indexes.
When the target table is NOPI,
the restriction that table must be empty does not exist.
|
LogTable
|
It is
the restart log table.
Restarts
are common during staged loading operations. When additional data is
available after a job is paused, the job is restarted by submitting a second
script that specifies the additional data. The Load operator recognizes the
job as a restart, reads the restart log to determine the status of the job,
then loads the additional file.
Restarts
can also occur following any unexpected error on a database. For example, if
a table
runs
out of available space during a load operation, the job terminates, the table
is paused, and
a
checkpoint is recorded in the restart log. Pausing the job in this way allows
you to manually
increase
the available space, if needed, then restart the job because the load
operation can
restart
a job from the last checkpoint in the restart log.
(Fastload
does not have a logtable )
|
ErrorTable1
|
Error
Table1 contains most of the errors connected with data and the data
environment. The
following
types of errors are captured:
•Constraint
violations- Records that violate a range
or value constraint defined for specific columns of a table.
•Unavailable AMP- Records to be written to a non-fallback table about an
offline AMP.
•Data conversion
errors- Records that fail to convert to a
specified data type.
Jobs
can use the default names of the error tables, or names can be user-specified
as an attribute of the Load operator.
|
ErrorTable2
|
Error
Table2 contains all of the rows that have violations of the unique primary
index.
In case of Load Operator This error table is not used when the target table
has a non-unique primary index.
(in
case of update operator this table can get records from other conditions too)
Jobs
can use the default names of the error tables, or names can be user-specified
as an attribute of the Load operator.
|
Auto-generation of Error Tables:
A
target table must exist on the Teradata Database when the Load operator job is
executed.
However If we have not specified error tables (specifying
them is optional), the Load
operator automatically creates names of the error tables as follows:
• The
first error table is ttname_ET
• The
second error table is ttname_UV
where
ttname is the name of the corresponding target table.
The value of the TargetTable attribute is truncated to 27 characters before the
suffixes "_ET" and "_UV" are appended. This means that if the value of the TargetTable
attribute is a fully qualified table name and that fully qualified name exceeds
the maximum supported length of a database object (30 characters), the
generated names for the error tables may not be what you intend. In such a
situation, Teradata recommends that you provide the names of the error tables
and not rely on the Load operator to generate the names for these tables
automatically.
Duplicate Rows:
Duplicate rows, which are exact duplicates of existing
table rows, are
never inserted, even if the target table is
defined as a multiset table.
Duplicate row violations are thus not captured in either
Error Table 1 or Error Table 2. Instead, they are counted and reported in the
status log at the end of a job.
However If the target table is defined as a NoPI table, duplicate rows are inserted. NoPI tables are inherently
multiset since no duplicate row checking is possible; with NoPI tables,
duplicate rows can be on different AMPs. Therefore, no duplicate row
elimination is performed.
If a
table has a unique primary index, a circumstance where there is duplicate row
takes precedence over a duplicate primary index error, meaning that the
offending row is counted and reported, but it is not captured in Error Table 2.
ErrorLimit attribute:
ErrorLimit
attribute is used to specify a limit to the number of errors that can be
tolerated before a job is terminated.
This limit, which is specified with the ErrorLimit
attribute, represents the total number of errors written to the first error table per instance of
the Load operator, not to all instances
combined. Therefore, if an error
limit is set to 1,000, a single load instance must detect that 1,000 rows are
inserted into the error table before the job is terminated.
The
error limit can also be reached at checkpoint time; see the examples below.
Consider
these examples if there are two instances running and the Error Limit has been
set to 1000.
• If
either instance by itself reaches 1000, it will terminate the job by returning
a fatal error.
• If
instance #1 processes 500 error rows and instance #2 processes 500 error rows
but does not reach a checkpoint. The job will continue processing.
• If
instance #1 processes 500 error rows and instance #2 processes 500 error rows
but does reach a checkpoint. The total number of error rows for all instances
combined is determined at checkpoint time and at the end of the Acquisition
Phase. If the total of all instances exceeds the error limit at that time, the
job will terminate with an error.
Staged Loading:
Staged
loading is the ability to pause an active load operation until additional data
is available.
Teradata
PT can load the first two files, pause the Load operation, and then load the
third file when it is available.
Staged loading is set by the attribute PauseAcq =
'Y', which prevents the Load operator from
proceeding to the application phase. Each stage of the load operation is
accomplished with a separate job script: one for the acquisition phase, and one
for the application phase
For
example:
To
accomplish the scenario with three files (one of which is unavailable until the
next day), run Job1 on Day 1 using the two existing files as input to the load,
with the PauseAcq = 'Y'setting.
When
this stage of the job is finished, the target table is paused and becomes
inaccessible to users. Attempts to access the target table (or the error
tables) return the following error message:
Operation
Not Allowed <tablename> is being loaded
On Day
2, restart the paused job by running Job2 using the third, newly available file
as input. For this stage, set PauseAcq = 'N'. When this stage of the job
finishes, the table is fully loaded and ready for access.
Note: A paused table, though
inaccessible, can be dropped.
Sessions and Instances:
A
minimum and a maximum number of sessions (the session limits) can be specified
for the Load operator. Both can also be specified for the Export and Update
operators.
Consider
the following usage notes which apply to all three types of operators.
• The
maximum sessions connected can never exceed the number of available AMPs in the
system, even if a larger number is specified.
• The
default is one session per available AMP.
• For
the MinSessions attribute, the minimum specification is one.
• The
MaxSessions attribute can be set to a number smaller than the number of AMPs on
the database server if fewer sessions are suitable for the job.
Limits on Load jobs:
The
Load operator requires one designated Teradata Database loader task.
The
number of concurrent load tasks is, however, configurable in the Teradata
Database environment using the same MaxLoadTasks field control used by
FastLoad, FastExport, and MultiLoad.
Checkpointing and Restarting:
The Load operator
takes checkpoints at the beginning and end of the acquisition phase.
More granular checkpoints during the acquisition phase
can be specified using the command line option -z when running Teradata PT
using the tbuild command. The -z option specifies checkpoint intervals in terms of
seconds.
The
following command string is an example of the -z option:
tbuild
-f <file name> -z 30
In this
command, the -f option indicates the script that is input to tbuild, and the -z
option indicates that a checkpoint will be taken every 30 seconds.
The
DEFINE JOB statement can also be used to specify a checkpoint value.
Checkpointing
during the application phase is managed internally by the Teradata Database,
and therefore is not user-controlled.
Load Operator as Standalone Operator:
The
Load operator can function as a standalone operator and supports an APPLY
statement with no SELECT statement and no INSERT statement.
When we
use LOAD as a standalone operator, use one of the following definitions:
TYPE
LOAD STANDALONE
TYPE
LOAD
Note that The STANDALONE keyword is
optional.
Example:
DEFINE
JOB LOAD_USER_DATA
(
DEFINE
OPERATOR LOAD_OPERATOR
TYPE
LOAD
(
.
.
.
);
.
.
.
APPLY
TO
OPERATOR (LOAD_OPERATOR[1]);
);
Note:
Use LOAD as a standalone operator to apply data on the target table without
sending more data.
We
might use this in a following scenario. Say we have 3 files loaded at separate
times in a day. In this case we create 3 TPT scripts each with PauseAcq
attribute as 'Y' and then a 4 script with only standalone LOAD operator which
triggers the application phase and loads the data actually to the table.
Hi can you please tell how can I use the active directory in this TPT . . ? Also can I link this script with FTP server I dont have to run it over and over again?
ReplyDeleteExample:
https://downloads.teradata.com/tools/articles/teradata-parallel-transporter-active-and-batch-directory-scans