Sunday 17 November 2013

11.2 Teradata Parallel Transporter - Stream Operator- Additional Attributes


Attributes in Details:

Attributes
Required/Optional
Description
AppendErrorTable = ‘value’
O
Optional attribute that specifies whether or not the Stream Operator will use the existing error table.

Valid values are:

• ‘No’ = The Stream operator will not use the existing error table
(default).
• ‘Yes’ = The Stream operator will use the existing error table or create the error table if it does not exist.

If the error table exists, the Stream operator displays the number of rows in the error table.
 
If the structure of the existing error table is not compatible with the
error table that the Stream operator expects, the Stream operator
terminates the job with an error message.
 
By default ,(ie when AppendErrorTable = 'NO'), the Stream operator terminates the job with an error message if the error table already exists.

ArraySupport = ‘option’
O
Optional attribute that specifies whether or not the Stream operator
will use the ArraySupport feature for the job.

Valid values are:
• ‘On’ = The Stream operator will use ArraySupport for the entire job.
• ‘Off’ = The Stream operator will not use ArraySupport.

Even if the value for ArraySupport is not specified, Array Support is still enabled and the Stream operator will use it for the entire job if the following criteria are met:
 
Both Teradata Database and CLIv2 support the Array Support
feature.
The DML statement is a single DML statement or an atomic
UPSERT statement.
• The job step must have a single DML group if Serialize is On.
If any of these criteria are not met, then the default value is 'Off' and the Stream operator will not use Array Support.
 
If the value for ArraySupport is set to ‘On’ and either Teradata Database or Teradata CLIv2 does not support the Array Support feature, the Stream operator will terminate with a fatal error.

If the ARRAYSUPPORT DML option is used as part of the APPLY
statement for a job, the DML value will override the value specified for the Stream operator ArraySupport attribute.

Note that ArraySupport can be specified at two places : -
1) As an attribute
2) ARRAYSUPPORT DML operation.
Buffers = buffers
O
Optional attribute that specifies whether to increase the number of
request buffers.

The range of values is a lower limit of 2 and no upper limit.
The default value is 3.

The maximum number of request buffers that may be allocated is the number of buffers multiplied by the number of connected sessions (Buffers * connected_sessions).

Request buffers are a global resource, so buffers are assigned to any
session as needed, and then returned to a free pool. At any point in
time, the number of request buffers assigned to a session can vary from
zero to Buffers * connected_sessions.
DropErrorTable= 'value'
O

Optional attribute that specifies whether or not the Stream Operator
will drop the error table at the end of a job.

Valid values are:

• ‘Yes’ = The Stream operator drops the error table when it is empty at
the end of a job (default). Teradata PT automatically executes a
DROP TABLE statement.

• ‘No’ = The Stream operator will not drop the error table even if it is
empty at the end of a job.

If the error table is not dropped, it can be used when AppendErrorTable is set to ‘Yes’ at the beginning of the next job.

DropMacro = 'value'
O
Optional attribute that instructs the Stream operator whether to drop
macros or keep them for future use.

VARCHAR DropMacro = '<Y[es]|N[o]>'

By default, the Streamoperator drops macros at the end of a successful
job. When the value of DropMacro is N or No, the macros remain in the
Teradata Database until a DROP MACRO statement is issued against
them.
ErrorLimit = limit
O
Optional attribute that Teradata Database specifies the approximate
number of records that can be stored in the error table before the
Stream operator job is terminated.

This number is approximate because the Stream operator sends
multiple rows of data at a time to Teradata Database. By the time
Teradata PT processes the message indicating that the error limit has
been exceeded, it may have loaded more records into the error table
than the actual number specified in the error limit.

The ErrorLimit specification must be greater than 0.

Specifying an invalid value will cause the Stream operator job to terminate.

By default, the ErrorLimit value is unlimited.
ErrorTable = 'etname'
O
Optional attribute that specifies the name of the error table.

This table contains information concerning data conversion errors, constraint violations, and other error conditions.

If the database for the error table is not specified, the table is placed in
the database associated with the user logon.

By default, the error table must be a new table. This default can be
changed. If the AppendErrorTable attribute is set to 'Yes' or 'Y', then the
error table can be a new or existing table.

If the name is not supplied, it is created by the Stream Operator.

User-supplied names for error tables must not exceed 30 bytes

LogTable = 'ltname'
R
Required attribute that specifies the name of the restart log table for
checkpoint information.

If the restart log table does not exist, the Stream operator creates it. If it exists, the Stream operator restarts from the last checkpoint.

The following privileges are required on the restart log table:
• SELECT
• INSERT
• DELETE

The following privileges are required on the database that contains the
restart log table.
• DROP
• CREATE

The Stream operator automatically maintains the restart log table.
Manipulating the restart log table in any way invalidates the restart
capability.

If the restart log table name is not fully qualified, it is created under the
user’s default (logon) database. Otherwise, you may specify a working
database using the WorkingDatabase attribute.

MacroDatabase = 'dbname'
O
Optional attribute that specifies the database to contain any macros
used by the Stream operator.

The default macro database is the restart log table database.

MaxSessions = maxsessions
O
Optional attribute that specifies the maximum number of sessions to
log on.

The MaxSessions value must be greater than 0.
Specifying a value less than 1 terminates the job.

The default is one session for each operator instance.
(note this is different that export,load ,update operator)

The sessions are distributed across instances.
The main instance calculates an even distribution of the Stream
operator sessions among the number of instances.
For example, if there are 4 instances and 16 Stream operatorsessions, then each instance will log on 4 Stream operator sessions.

MinSessions = minsessions
O
Optional attribute that specifies the minimum number of sessions
required for the Stream operator job to continue.

The MinSessions value must be greater than 0 and less than or equal to
the maximum number of Stream operator sessions.

Specifying a value less than 1 terminates the job.

The default is 1
OperatorCommandID =‘commandID’
O
Although you can specify rate and periodicity values using Stream
operator attributes, you may not know the optimal values for a specific
job step until after the job has begun running.

OperatorCommandId allows you to identify a specific reference of a Stream operator to which you can assign new rate or periodicity values
after the job has begun, using twbcmd:

Teradata PT will generate a default value for operatorCommandId
composed of <operator object name> + <process Id> for each copy of the operator in the APPLY specification.

If you want to assign another identifier, do the following:

1 Declare the operatorCommandId attribute in the DEFINE OPERATOR statement for the STream operator.

2 You can optionally assign a value to the OperatorCommandID attribute in a referenced copy of the Stream operator (in an APPLY statement). If no value is assigned, Te r a d a t a P Twill provide a system-generated value. A useful operatorCommandID value might be the number of the job step in which you want to change the Rate, as follows:

APPLY <dml1> TO OPERATOR ( Stream_Oper[2]
ATTRIBUTES ( OperatorCommandID = ‘ratestep#1’)),
APPLY <dml2> TO OPERATOR ( Stream_Oper[3]
ATTRIBUTES ( OperatorCommandID = ‘ratestep#2’)),

3 Use the twbcmdutility to assign a Rate value to a specific
Stream operator copy.
Pack = number
O
Optional attribute that specifies the number of statements to pack into
a multiple statement request.

The maximum value is 2400.

The default value is 20.
PackMaximum = 'packmax'
O
Optional attribute that requests the Stream operator to dynamically
determine the maximum possible pack factor for the current Stream
job.

The PackMaximum values are:
• 'No' ('N') = No pack (default)
• 'Yes' ('Y') = Determine maximum possible pack factor
Periodicity = periodicity
O
Option that specifies that the DML statements sent by the Stream operator to the Teradata Database will be as evenly distributed as possible over each one minute interval.

The periodicity value sets the number of sub-intervals per minute. Periodicity facilitates the orderly and efficient use of system resources.

For example: If the statement rate is 1600 and the periodicity value is
10, then the maximum number of statements processed is 160 (1600/
10) statements every 6 (60/10) seconds.

Use of the Periodicity attribute is subject to the following conditions
and rules:

• The valid values are integers between 1 and 600.
• The default value is 4, which means four 15-second periods per minute.
• If the statement rate is unlimited, then the Periodicity value is ignored.  So this attribute is related to the Rate attribute.
• While the job is running, users can change the Periodicity value
using the Teradata PT External command interface utility twbcmd.

QueueErrorTable = ‘option’
O

Optional attribute that specifies whether the error table is a queue table.

Valid values are:
• ‘Yes’ (‘Y’) = Create the error table as a queue table.
• ‘No’ (‘N’) = (Default) Create the error table as a non-queue table.

This attribute is unique to the Stream operator and to the error table in the Stream operator.

This attribute is especially useful in capturing errors that result when using the SELECT and CONSUME database operation, which returns rows and DELETEs them.

Using an error table as a queue table can eliminate the need to delete the rows in the error table.
Rate = statement rate
O
Option that specifies the maximum number of DML statements per minute the Stream operator can submit to the Teradata Database.

Use of the Rate attribute is subject to the following conditions and
rules:
• The statement rate must be a positive integer.
• If the statement rate is not specified, the rate is unlimited.
• If the statement rate is less than the statement packing factor, the
Stream operator sends requests smaller than the packing factor.
• If the statement rate is invalid, Stream Operator will display an error
message and terminate with a return code of 8.
• While the job is running, users can change the statement rate value
using the Teradata PT External command interface utility twbcmd.

Robust = 'robust'
O
Optional attribute that specifies whether or not to use robust restart
logic for recovery/restart operations.

In “robust mode,” one database row is written in the log restart table for every request issued.(1 for each multistatement request.. So letter entries with higher pack attribute)

This collection of rows in the restart log table can be referred to as the request log. Because a request is guaranteed by the Teradata Database to either completely finish or completely roll back, the request log will always accurately reflect the completion status of an import.


The Robust values are:

• 'Yes' ('Y') = Use robust restart logic (default). In the robust mode,
for each packed request, a number of “partial checkpoint” rows are written to the log between checkpoints. The rows are deleted each time a checkpoint is written.

In Robust recovery mode, the Stream operator must next ascertain how much processing has been completed since the last logged checkpoint. This is accomplished by reading back a set of “Partial Checkpoints” from the Teradata Database, sorting them and then reprocessing all transactions that were left incomplete when the job was interrupted.


• 'No' ('N') = Use simple restart logic.
In this case, restarts cause the Stream operator to begin where the last checkpoint occurs in the job.
Any processing that occurs after the checkpoint is redone. (note that by default checkpoint is taken before and after the data is sent )

This method does not have the extra overhead of the additional database writes in the robust logic, and should be adequate in certain DML statements that can be repeated without changing the results of the operation.
TenacityHours = hours
O
Optional attribute that specifies the number of hours that the Stream operator continues trying to log on when the maximum number of sessions are already running on the Teradata Database.

The default value is 4 hours.

To enable the tenacity feature, the hours value must be greater than 0. Specifying a value of 0 disables the tenacity feature.

Specifying a value of less than 0 terminates the Stream operator job
TenacitySleep = minutes
O
Optional attribute that specifies the number of minutes that the Stream
operator job pauses before retrying a log on operation when the
maximum number of sessions are already running on the Teradata
Database.

The minutes value must be greater than 0.

If you specify a value less than 1, the Stream operator responds with an error message and terminates the job.

The default is 6 minutes.
WorkingDatabase = 'dbname'
O
Optional attribute that specifies a database other than the logon
database as the default database.

The name of the database that is specified with this attribute is used in
the Teradata SQL DATABASE statement that is sent by the operator
immediately after connecting the two SQL sessions.

UserName = 'userid'
R
Same as in Export operator.
UserPassword = 'password'
R
Same as in Export operator.
TraceLevel = 'level'
O
Same as in Export operator.
QueryBandSessInfo =
‘<Query Band
expression>’
O
Same as in Export operator.
LogonMech = ‘string’
O
Same as in Export operator.
LogonMechData = ‘data’
O
Same as in Export operator.
LogSQL = 'option'
O
Same as in Export operator.
DataEncryption = ‘option’
O
Same as in Export operator.
DateForm = 'datatype'
O
Same as in Export operator.
AccountId = 'acctid'
O
Optional attribute that specifies the account associated with the user
name.

If omitted, it defaults to the account identifier of the immediate owner
database.
NotifyExit = 'inmodname'
O
Attribute that specifies the name of the user-defined notify exit routine
with an entry point named _dynamn. If no value is supplied, the
following default name is used:
• libnotfyext.dllfor Windows platforms
• libnotfyext.slfor HP-UX platforms
• libnotfyext.sofor all other UNIX platforms
• NOTFYEXTfor z/OS platforms

NotifyLevel = 'notifylevel'
O
Optional attribute that specifies the level at which certain events are
reported. The valid values are:
• 'Off' = no notification of events is provided (default)
• 'Low' = 'Yes' in the Low Notification Level column
• 'Med' = 'Yes' in the Medium Notification Level column
• 'High' = 'Yes' in the High Notification Level column
• “Ultra’ = ‘Yes” in the Ultra Notification Level column

NotifyMethod = 'notifymethod'
O
Optional attribute that specifies the method for reporting events. The
methods are:

• 'None'= no event logging is done (default).
• 'Msg'= sends the events to a log.
• On Windows, the events are sent to the event log that can be
viewed using the Event Viewer. The messages are sent to the
Application log.
• On Solaris, AIX, Linux, and HP-UX platforms, the destination of
the events is dependent upon the setting specified in the /etc/syslog.conffile.
• On z/OS systems, events are sent to the job log.
• 'Exit'= sends the events to a user-defined notify exit routine.

NotifyString = 'notifystring'
O
Optional attribute that provides a user-defined string to precede all
messages sent to the system log. This string is also sent to the userdefined notify exit routine. The maximum length of the string is:
• 80 bytes, if the NotifyMethod is 'Exit'
• 16 bytes, if NotifyMethod is 'Msg'


11.1 Teradata Parallel Transporter - Stream Operator - Basics

Stream Operator Capabilities:

  • Stream Operator is a Consumer Operator.

  • Stream Operator emulates Teradata Tpump Utility.

  • Stream Operator is used to perform high-speed parallel Inserts, Updates, Deletes, and Upserts in a near-real-time to one or more empty or preexisting Teradata Database tables without locking target tables.
    (note that locking is different from Load,Update operators)

  • Row level locking allows load operations in the background during normal system use.

  • The Stream operator instances can update up to 127 tables on the Teradata Database.

  • Multiple parallel instances can be used to improve the performance of the update.






Stream Operator Definition:

Following is a Stream operator definition.

DEFINE OPERATOR STREAM_LOAD_OPERATOR
DESCRIPTION 'STREAM OPERATOR FOR Teradata PT’
TYPE STREAM
SCHEMA *
ATTRIBUTES
(
VARCHAR TdpId,
VARCHAR UserName,
VARCHAR UserPassword,
VARCHAR LogTable,
VARCHAR ErrorTable,   --> Note that only one error table
VARCHAR AppendErrorTable,
VARCHAR DropErrorTable,
VARCHAR PrivateLogName
);



TDPID :
It’s a optional Security attribute that specifies the name of the Teradata database machine(non-mainframe platforms) or TDP(mainframe platforms) for Stream Operator.
 
If we do not specify the value for the TdpId attribute, the operator uses the default TdpID established for the user by the system administrator.
UserName:
Attribute that specifies the user name on the Teradata Database.
UserPassword :
Attribute that specifies the password associated with the user name.
PrivateLogName
Optional attribute that specifies the name of a log that is maintained by the Teradata PT

Logger inside the public log. The private log contains all of the messages produced by the operator.



LogTable:

A restart log table  contains restart information written during job runs
A Log Table is required for  any execution of the Stream operator.

This is specified using the LogTable attribute.

Restarts can occur following any unexpected error on a database.

For example: If a table runs out of available space during a load operation, the job terminates, the table is paused, and a checkpoint is recorded in the restart log. Pausing the job in this way allows you to manually increase the available space, if needed, then restart the job because the load operation can restart a job from the last checkpoint in the restart log.





ErrorTable:

The ErrorTable is specified using the ErrorTable Attribute. Note that there is only 1 error table.

This table contains information concerning data conversion errors, constraint violations, and other error conditions.

This attribute must be a new table name, unless the AppendErrorTable attribute is set to YES.

If the name is not supplied, it is created by the Stream operator. However note that user supplied names for error tables must not exceed 30 bytes.

By default, the Stream operator drops the error table after data loading, unless the error table is non-empty.
However the error table can be kept after data loading by setting the
DropErrorTable attribute to NO.
Keeping the error table allows the error table to be re-used for future jobs and saves the time required to drop the table.

By default, the Stream Operator will create the error table. If the error table already exists, Stream Operator will terminate the job with an error message. The exception is on a job restart. On a job restart, Stream Operator will continue.

  • When the value of the AppendErrorTable attributes is YES, the Stream operator reuses the error table and displays the number of rows already existing in the error table.

  • If the error table does not exist, the Stream operator creates the error table before data loading and continue.

  • If the structure of the existing error table is not compatible with the error table the Stream operator expects, it terminates the job with an error message before data loading.









Error Limit:

Attribute that specifies the approximate number of records that can be stored in the error table before the Stream operator job is terminated.

This number is approximate because the Stream operator sends multiple rows of data at a time to Teradata Database. By the time Teradata PT processes the message indicating that the error limit has been exceeded, it may have loaded more records into the error table than the actual number specified in the error limit.

Restriction : The value greater than 0.
 
Invalid Value : will cause the Stream operator job to terminate.
 
Default : ErrorLimit value is unlimited.

Note : The ErrorLimit specification applies to each instance of the Stream operator







Using Database Macros:

The Stream operator uses macros to modify tables rather than actual DML statements.

Before beginning a load, the operator creates macros to represent the DML statements.
The macros are then iteratively executed in place of the DML statements.

The Stream operator also removes the macros after all rows are loaded.

The MacroDatabase attribute to specify the database that contains the macros created by the Stream operator.

Note : If not specified, the database used is the database that contains the Restart Log table.







Multiple Statement Requests:

The most important technique used by the Stream operator to improve performance is the multiple statement request. Placing more statements in a single request is beneficial for two reasons.

It reduces:
• Network overhead because large messages are more efficient than small ones.
Robust mode recovery overhead to one extra restart log table row for each request. ( In robust mode a log entry is made per request. Larger the multistatement request less logs need to be made. If one of the SQL in the multistatement request fails the entire request is rolled back)

The Stream operator packs multiple statements into a request based upon the Pack attribute specification.







SERIALIZE:

The Serialize option only applies to the Stream operator.


We use the Serialize option
when correct sequencing of transactions is required.


Using the Serialize option in APPLY statements, the Stream operator ensures that operations for a given row occur in the order they are received from the input stream.  Note that we specify the SERIALIZE option in the APPLY statements.


Restriction:  is that when using the Serialize option, only one instance of the Stream operator is allowed. Specifying more than one instance causes the Stream operator to terminate with an error.




Default:

Default is OFF unless the job contains an upsert operation, which causes Serialize to switch the default to ON.


When the Serialize option is set to OFF, transactions are processed in the order they are encountered, then they are placed in the first available buffer. Buffers are sent to parsing engine (PE) sessions and PEs process the data independently of other PEs. In other words, transactions might occur in any order.



Sequencing Key:

To use this option, associate a sequencing key (usually the primary index) with the target table. Each input data row is hashed based on the key to determine the session assigned to process each input row. This allows all rows with the same key to be processed in sequence by the same session, which is especially important if rows are distributed among many sessions.

If the Serialize option is set to ON, operations on a row occur serially in the order submitted.

The sequencing key of SERIALIZE ON is specified as one or more column names occurring in the input data SCHEMA definition. These SCHEMA columns are collectively referred to as the key.

Usually the key is the primary index of the table being updated, but it can be a different column or set of columns. For example:

APPLY
('UPDATE emp SET dept_name = :dept_name
WHERE empno = :empno;')
SERIALIZE ON (empno)
TO TARGET_TABLE[1]

This APPLY statement guarantees that all data rows with the same key (empno) are applied to the database in the same order received they are received from the producer operator. In this case, the column empno is the primary index of the Emp table.


Note: that SERIALIZE ON is local to a specific DML statement. In the following example, a group DML is specified, but only the first statement uses the Serialize option:

APPLY
('UPDATE emp SET dept_num = :dept_num
WHERE empno = :empno; ')
SERIALIZE ON (empno)
('UPDATE dept SET dept_name = :dept_name
WHERE deptno = :deptno; ')
TO TARGET_TABLE[1]

Following are some of the advantages to using the Serialize option, and might improve performance:

• SERIALIZE ON can eliminate the lock delays or potential deadlocks caused by primary index collisions coming from multiple sessions.
• SERIALIZE ON can also reduce deadlocks when rows with non-unique primary index values are processed.









Robust and Non-Robust Mode:

Robust Mode causes every DML operation to be check pointed and ensures on restart that no operation is applied more than once.

The robust mode requires more writes to a restart log, which might impact performance more, however, using robust mode ensures that a restart avoids reprocessing rows that a normal interval checkpoint might necessitate.

Robust is the default mode for all Stream operator jobs. The Robust attribute turns the mode on or off.

Setting the attribute to “yes” tells the Stream operator to use robust restart logic.

VARCHAR Robust = 'Yes' (or 'Y')

Robust mode causes a row to be written to the log table each time a buffer successfully completes its updates.

Mini-checkpoints are written for each successfully processed row.
These mini-checkpoints are deleted from the log when a checkpoint is taken, and are used at restart to identify the rows that have been successfully processed, which permits them to be bypassed at restart.

The larger the pack factor, the less overhead is involved in this activity.

Choosing the Robust mode is particularly useful to avoid problems with data integrity.

Robust mode is recommended in the following situations to avoid having an adverse affect on restarts:

• INSERTs into multi-set tables - Robust mode prevents the insertion of duplicate rows, which could insert the same row a second time.

• UPDATEs based on calculations- Robust mode prevents the duplicate application of calculations.

• Large Pack factors - Robust mode does not involve the application and rejection of duplicate rows after restarts, which is a time-consuming process of logging errors to the error table.

• Time-stamped data - Robust mode prevents the possibility of stamping identical rows with different time stamps, resulting in duplicate rows.


Setting the attribute to “no” tells the Stream operator to use simple restart logic rather than robust logic.

VARCHAR Robust = 'No' (or 'N')

In a non-robust mode, restarts begin where the last checkpoint occurs in a job. Because some additional processing will most likely take place after the checkpoint is written, the requests that occur after the checkpoint are resubmitted by the Stream operator as part of the restart process.

Re-attempting updates can also be a problem if update calculation, for example, is based on existing data in the row, such as adding 10% to an amount. Doing the update calculation a second time add an additional 10% to the amount, thus compromising data integrity. In this type of update, it is best to use robust mode to ensure that no DML operation is applied more than once.

The non-robust (or simple restart) method does not involve the extra overhead that comes with the additional inserts to the restart log table that are needed for robust logic, so overall processing is notably faster.






Data Quality Affects Performance:

If data contains errors, a large Pack factor can slow performance because of the way Stream handles errors.

For example, if several hundred statements are packed, when an error occurs, the entire request is rolled back. The Stream operator then removes the error-producing statement(s) and then reissues the entire request. Such a process can be costly from a performance standpoint.





Statement Packing

To provide optimal performance, the Stream operator packs individual DML statements into a larger multistatement request based on the rate specified by the Pack attribute. This type of processing requires less overhead than multiple individual requests.

The Pack attribute specifies the number of statements in a multistatement request. Specifying a Pack rate improves network/channel efficiency by reducing the number of sends and receives between Teradata PT and the Teradata Database.

Maximum of 2400 statements can be specified.
 
Trial and error might be required to determine the best Pack rate for a Stream job. As the Pack rate is increased, the throughput improvement is usually great at first, then falls off.
Ex:
Going from a Pack rate of 1 to 2 could provide huge performance gains, and going from 2 to 4 could be just as beneficial, but moving from 8 to 16 might cause a performance drop.


If the PackMaximum attribute is set to 'Yes', the Stream operator determines the maximum pack for the job, and then reports it.

Two factors to consider are:
 
• The maximum Pack factor based on Stream operator restrictions
• The optimal Pack factor for a particular job

These two factors might not be equal. The maximum rate lets you know the upper limit, but performance might improve at a smaller rate. For this reason, it is recommended that PACKMAXIMUM not be used for production jobs until you determine the optimal Pack factor.



Tuning the Pack Factor:

To determine the ideal pack factor to specify in the Pack attribute, first use the PackMaximum attribute by setting it to 'Yes'. Setting this attribute to Yes on the first job run sets up iterative interactions with the Teradata Database to heuristically determine the maximum possible pack factor.

At the end of the run, this value is displayed in the Stream operator’s logged output. Specify that determined value in the Pack attribute on subsequent runs. Set the PackMaximum to 'No'.




Array Support

The Array Support feature allows DML requests containing only a single statement to be executed once for each of multiple rows of input data, each row specified being one of the members of the array. The DML must be contained within an APPLY statement that includes Stream operator and does not set ArraySupport to Off.

Use of array support improves performance by:
• Increasing the amount of data that can be sent per request from a total of approximately 64KB to a total of approximately 1MB, with a limit of 64KB per input data row.



Latency Interval:

Latency is the interval value, expressed in seconds, between the flushing of stale buffers.
Latency interval is an option that is exclusively used by the Stream operator.

In normal operations (without latency), the Stream operator reads data from the data stream until its buffer is full, then it writes all buffered rows to the Teradata Database. The data is written to the Teradata Database only when the buffer is full or when a checkpoint is taken.

However, a latency interval (for example, set to 5400 seconds) causes the following:

• The Stream operator reads data from the data stream, and empties its buffer, writing the contents to the Teradata Database every 90 minutes (5400 seconds) regardless of whether it is full.
 
• If the buffer fills up within the time period (in this case, 90 minutes), it writes to the

Teradata Database as it would during normal operation.

To set the latency interval, use the following syntax:

tbuild -l <LatencyInterval> -f <file name>

The value used for the latency interval must be a non-zero unsigned integer. The guiding factor is how stale you are willing to allow data to be.

For example, to run a continual load script with a latency interval of two hours, enter:
tbuild -l 7200 -f continualload

(this may be used when the rows come in at very slow rate.. That time we can decide maximum no of mins before applying the rows)




Operational Considerations:

NoPI Tables
Operations other than Insert on target tables defined as NoPI are very non-performant, unless there is an appropriate secondary index path to the data rows.


Sessions and Instances:

Because the Stream operator uses Teradata SQL sessions in its communication with Teradata Database, it does not use load tasks. The Stream operator provides continuous updates using row level locking, allowing constant load operations in the background during normal system use.

Both a minimum and a maximum number of sessions can be used by the Stream operator.
The minimum specification is one. The default is one session for each operator instance.
The number of sessions is evenly distributed among the number of operator instances. If 20 sessions are requested and four instances of the operator are invoked, then each instance will receive five of the sessions.


Checkpointing and Restarting:

Checkpoint options control how often a row is written to the checkpoint file for the purposes of restarting a job. Unless otherwise specified, a checkpoint is taken at the start of and at the end of the input data. Since this process does not provide granular restartability in the case of longer running jobs, checkpoint intervals can be user-specified in terms of minutes or seconds.

• Seconds - Use the command line option -z. For example, the following command indicates that a checkpoint will be taken every 30 seconds:

tbuild -f <script file name> -z 30

• Minutes or seconds - Specify as part of a DEFINE JOB statement. For example:

DEFINE JOB test_job
SET CHECKPOINT INTERVAL 30 SECONDS
or
SET CHECKPOINT INTERVAL 10 MINUTES


DML Option Categories

The DML Options can be classified into four logical categories, which can be specified in any order:
• ARRAY SUPPORT
• SERIALIZE
USE
• DML ERROR OPTIONS, which include:
  1. MARK
  1. IGNORE
  1. INSERT FOR

The following restrictions apply:
• Multiple ARRAY SUPPORT Options cannot be specified.
• Multiple SERIALIZE Options cannot be specified.
• Multiple USE Options cannot be specified.
• Multiple DML Error Options (MARK,IGNORE, INSERT FOR) can be specified in one block. However these cannot be interleaved with the other three DML Option categories.