4.2 Teradata PT Performance-
Sessions and Instances
With in
the TPT job scripts we can specify multiple instances of the operator at the
point where the operator is referenced in an APPLY statement.
Operator
instances then execute in parallel to complete the task.
Each of
these instances in turn can login multiple sessions.
Thus
multiple instances and multiple sessions each instance provides high degree of
parallelism.
The
following operators can be configured to enhance job performance through the
optimization of instances and sessions:
•
DataConnector (instances only)
•
Export
• Load
• SQL
Inserter (instances only)
•
Stream (instances only)
•
Update
4.2.1 Specifying Number of
Sessions
By
default each operator will attempt to simultaneously log on to one Teradata
Database session for each AMP configured on the Teradata Database system.
For
some jobs, the default parallelism may be excessive. In other cases, there may
not be enough available AMPs to provide the sessions necessary to run the job
efficiently.
This is
the reason why we use the attributes - MaxSessions and MinSessions that support
session limits.
- MaxSessions determines the maximum number of sessions an operator can use.
- MinSessions, determines the minimum number of sessions that must be available in order for the job to run.
Proper setting of the MinSessions attribute prevents the job
from running until there are enough AMPs for it to run at an acceptable rate.
We
should consider the following factors while specifying the MaxSessions
attribute:
- If no value is set for MaxSessions, the operator attempts to connect to one session per available AMP.
- The DDL, ODBC, and SQL Selector operators are limited to a single concurrent session, that is, one session each per single-step job, or one each session per step in a multi-step job.
- The number of sessions specified by the value of the operator MaxSessions attribute are balanced across the number of operator instances. For example, if the Update operator is defined with two instances, and the MaxSessions attribute is set to 4, each instance of the defined Update operator will run two sessions
If the value of the MaxSessions attribute for an operator is
smaller than the number of operator instances, the job will abort.
- If the value of MaxSessions is set to a number greater than the number of available AMPs, the job runs successfully, but logs on only as many sessions as available AMPs.
- For some jobs, especially those running on systems with a large number of AMPS, the default session allocation (one per available Teradata Database system AMP) may not be advantageous, and you may need to adjust the MaxSessions attribute value to limit the number of sessions used.
- The SQL Inserter operator supports only one session.
- The Stream operator uses an SQL protocol, so it is not seen as a “load job” by the Teradata Database. Therefore, Stream operator connects to as many sessions as requested, up to the number of sessions allowed by the Teradata Database.
4.2.2 Specifying Multiple
instances:
We can
specify the number of instances for an operator in the APPLY TO or SELECT FROM
statement .
Syntax
: operator_name [number of instances]
Ex:
APPLY <DML>...TO OPERATOR (UPDATE_OPERATOR [2]...)
It is
very Important to note that Producer and Consumer Operators use multiple
instances differently.
- Producers automatically balance the load across all instances, pumping data into the data stream as fast as they can
- By default, consumers will use only as many instances as needed. If one instance can read and process the data in the data stream as quickly as the producers can write it, then the other instances are not used. If the first instance cannot keep up with the producer operators then the second instance is engaged, and so on.
However this behavior of consumer operators can be changed by
using the -c
option.
This option indicates that the data blocks must be shipped to
the target consumer operators in a cyclical, round-robin manner, providing a
more even distribution of data to consumer operators.
If the
number of instances is not specified, the default is 1 instance per operator.
Teradata
PT will start as many instances as specified, but it uses only as many as
needed. This might actually lead to wastage of system resources. So we must
specify multiple instances cautiously.
We
should read the Teradata PT log file, which displays statistics showing how
much data was processed by each instance. Reduce the number of instances if you
see under-utilized instances of any operators. If all instances are used add
more and see if the job runs better.
Imp Note: The number of sessions that we specify will
be shared by the number of instances. So if we specify 100 session and we have
2 instances then each of them gets 50.
If the
number of instances exceeds the number of available sessions, the job aborts.
Therefore,
when specifying multiple instances make sure the MaxSessions attribute is set
to a high enough value that there is at least one session per instance.
No comments:
Post a Comment