How is Teradata Parallel
Transporter Operation Different from existing stand alone utilities:
Though
we can load and extract using standalone utilities , they are limited to serial
environment.
Advantage
TPT has over Stand alone utilities is that TPT makes use of Data Streams. Data
Stream acts as a pipeline between Operators. With Data Stream data flows from
one operator to another.
Teradata
PT supports following types of environment:
A] Pipeline Parallelism
Teradata
PT pipeline parallelism is achieved by connecting operator instances through
data streams during a single job.
In the
above figure we can see that export operator extracts data from source and
writes to data stream. The Filter Operators read data from data stream, process
it and writes to another data stream. This data stream is read by load operator
and loads into Teradata database.
We
should note that all the 3 operators run their process and concurrently.
This
means the Filter operator starts processing the data as soon as extract operator starts putting data on Data stream.
Thus we don’t need to write data to an intermediate file. This saves space and
also time because if we had used intermediate file we would have to wait for
extract to complete fully before we start filter Operation. But with TPT as
soon as the data is available the Filter operation can start processing without
waiting for export operator to complete processing.
B] Data Parallelism
We can
process larger quantities of data by partitioning a source data into a number
of separate sets, with each partition handled by a separate instance of an
operator.
Here
Multiple instance of producer operators run in parallel to extract more data
simultaneously from same data source and put it to Data stream and then
multiple Consumer operators read from this Data Stream and process data
simultaneously thus saving time.
No comments:
Post a Comment