Saturday, 21 September 2013

1.3 Difference between TPT and Standlone utilities

How is Teradata Parallel Transporter Operation Different from existing stand alone utilities:

Though we can load and extract using standalone utilities , they are limited to serial environment.


Advantage TPT has over Stand alone utilities is that TPT makes use of Data Streams. Data Stream acts as a pipeline between Operators. With Data Stream data flows from one operator to another.


Teradata PT supports following types of environment:

A] Pipeline Parallelism

Teradata PT pipeline parallelism is achieved by connecting operator instances through data streams during a single job.

In the above figure we can see that export operator extracts data from source and writes to data stream. The Filter Operators read data from data stream, process it and writes to another data stream. This data stream is read by load operator and loads into Teradata database.

We should note that all the 3 operators run their process and concurrently.
This means the Filter operator starts processing the data as soon as extract  operator starts putting data on Data stream. Thus we don’t need to write data to an intermediate file. This saves space and also time because if we had used intermediate file we would have to wait for extract to complete fully before we start filter Operation. But with TPT as soon as the data is available the Filter operation can start processing without waiting for export operator to complete processing.


B] Data Parallelism

We can process larger quantities of data by partitioning a source data into a number of separate sets, with each partition handled by a separate instance of an operator.




Here Multiple instance of producer operators run in parallel to extract more data simultaneously from same data source and put it to Data stream and then multiple Consumer operators read from this Data Stream and process data simultaneously thus saving time.

No comments:

Post a Comment