Sunday 22 September 2013

4.2 Teradata PT Performance - Sessions and Instances


4.2 Teradata PT Performance- Sessions and Instances

With in the TPT job scripts we can specify multiple instances of the operator at the point where the operator is referenced in an APPLY statement.

Operator instances then execute in parallel to complete the task.
Each of these instances in turn can login multiple sessions.

Thus multiple instances and multiple sessions each instance provides high degree of parallelism.

The following operators can be configured to enhance job performance through the optimization of instances and sessions:
• DataConnector (instances only)
• Export
• Load
• SQL Inserter (instances only)
• Stream (instances only)
• Update








4.2.1 Specifying Number of Sessions

By default each operator will attempt to simultaneously log on to one Teradata Database session for each AMP configured on the Teradata Database system.
For some jobs, the default parallelism may be excessive. In other cases, there may not be enough available AMPs to provide the sessions necessary to run the job efficiently.

This is the reason why we use the attributes - MaxSessions and MinSessions that support session limits.

  • MaxSessions determines the maximum number of sessions an operator can use.

  • MinSessions, determines the minimum number of sessions that must be available in order for the job to run.
Proper setting of the MinSessions attribute prevents the job from running until there are enough AMPs for it to run at an acceptable rate.




We should consider the following factors while specifying the MaxSessions attribute:

  • If no value is set for MaxSessions, the operator attempts to connect to one session per available AMP.

  • The DDL, ODBC, and SQL Selector operators are limited to a single concurrent session, that is, one session each per single-step job, or one each session per step in a multi-step job.

  • The number of sessions specified by the value of the operator MaxSessions attribute are balanced across the number of operator instances. For example, if the Update operator is defined with two instances, and the MaxSessions attribute is set to 4, each instance of the defined Update operator will run two sessions
If the value of the MaxSessions attribute for an operator is smaller than the number of operator instances, the job will abort.

  • If the value of MaxSessions is set to a number greater than the number of available AMPs, the job runs successfully, but logs on only as many sessions as available AMPs.

  • For some jobs, especially those running on systems with a large number of AMPS, the default session allocation (one per available Teradata Database system AMP) may not be advantageous, and you may need to adjust the MaxSessions attribute value to limit the number of sessions used.

  • The SQL Inserter operator supports only one session.

  • The Stream operator uses an SQL protocol, so it is not seen as a “load job” by the Teradata Database. Therefore, Stream operator connects to as many sessions as requested, up to the number of sessions allowed by the Teradata Database.













4.2.2 Specifying Multiple instances:

We can specify the number of instances for an operator in the APPLY TO or SELECT FROM statement .

Syntax : operator_name [number of instances]
Ex: APPLY <DML>...TO OPERATOR (UPDATE_OPERATOR [2]...)

It is very Important to note that Producer and Consumer Operators use multiple instances differently.

  • Producers automatically balance the load across all instances, pumping data into the data stream as fast as they can

  • By default, consumers will use only as many instances as needed. If one instance can read and process the data in the data stream as quickly as the producers can write it, then the other instances are not used. If the first instance cannot keep up with the producer operators then the second instance is engaged, and so on.

However this behavior of consumer operators can be changed by using the -c option.
This option indicates that the data blocks must be shipped to the target consumer operators in a cyclical, round-robin manner, providing a more even distribution of data to consumer operators.

If the number of instances is not specified, the default is 1 instance per operator.
Teradata PT will start as many instances as specified, but it uses only as many as needed. This might actually lead to wastage of system resources. So we must specify multiple instances  cautiously.
We should read the Teradata PT log file, which displays statistics showing how much data was processed by each instance. Reduce the number of instances if you see under-utilized instances of any operators. If all instances are used add more and see if the job runs better.

Imp Note:  The number of sessions that we specify will be shared by the number of instances. So if we specify 100 session and we have 2 instances then each of them gets 50.
If the number of instances exceeds the number of available sessions, the job aborts.

Therefore, when specifying multiple instances make sure the MaxSessions attribute is set to a high enough value that there is at least one session per instance.

No comments:

Post a Comment