Sunday 6 October 2013

5.2 Teradata Parallel Transporter - Data Connector attributes meaning

Following is the meaning of each of the attributes for Data Connector operator in detail:

  1. FILENAME and DIRECTORYPATH


Use of the file name differs depending upon the operator type and operating system.

  • Data Connector producer operator

When data connector operator is used as producer we can use the * wildcard character in the FileName attribute to allow processing all the in the unix directory (or all members in the PDS or PDSE -applies for mainframe)

On Unix system the filename attribute is limited to 255 characters.
We can specify the complete file name    
Ex: VARCHAR Filename = '/home/sukul/inputfile.txt'
Or we can just specify the name of the file in the directory as VARCHAR Filename = 'inputfile.txt'.
In this case job looks for the optional DirectoryPath attribute. If the DirectoryPath attribute is specified as
VARCHAR DirectoryPath = '/home/holdy/' then the file is assumed at the path '/home/holdy/inputfile.txt'.
But if the directory is not defined in the optional DirectoryPath attribute, filename is expected to
be found in the default directory.


On Z/Os Filename can be
  • Fully qualified dataset name or PDS name including member name
  • Only member name of the PDS or PDSE script library or
  • 'DD:<ddname>'
Note that when only the member name is specified for the FileName, then the (PDS or PDSE) data set name containing the member must be specified by the DirectoryPath attribute.

  • Data Connector consumer operator

When using the DataConnector operator as a consumer, the FileName attribute becomes the complete file specification, and the FileName cannot contain the wildcard character (*).


If the pathname that you specify with the FileName attribute (as filename) contains any embedded pathname syntax (“/ “ on UNIX or "\\" on Windows), the pathname is accepted as the entire pathname. However, if the DirectoryPath attribute is present, the FileName attribute is ignored, and a warning message is issued.


Z/OS
FileName = '//'name.name(member)''
z/OS PDS DSN Name.Name(Member). Note the // used when specifying the name of the dataset.

FileName = '//'name.name''
z/OS DSN (sequential) Name.Name

FileName = 'member'
z/OS PDS member expected to reside in the dsn that is defined in the DirectoryPath attribute.

FileName = 'DD:ddname'
z/OS DSN described in JCL DD statement name “ddname.”
Unix
FileName = '/tmp/user/filename'
UNIX pathname.

FileName = 'filename'
If the DirectoryPath attribute is undefined, filename is located in the default directory.
Windows
FileName = '\\tmp\userfilename'
Windows path name.

FileName = 'filename'
Windows file name expected to be found in the directory defined in the DirectoryPath attribute.
If the DirectoryPath is not defined, filename is located in the default directory



  1. FILELIST

Expected values as 'Y' and 'N'

Adding FileList = ‘Y’ indicates that the file identified by FileName contains a list of files to be processed as
input or used as containers for output.

Note that when we use the filelist we expect the filenames to be full path specifications.
If no directory name is included, the files are expected to be located within the current directory.
Supplying full paths for output files enables you to write files to multiple directories or disks.

Attributes that we cannot use with Filelist
  • Directory Path attribute
  • ArchiveDirectoryPath attribute

Imp Note : When the combination of FileName and FileList attributes are used to control output, the supplied file list must have the same number of files as there are defined consumer instances; a mismatch results in a terminal error. At execution, rows are distributed to the listed files in a round-robin fashion if the tbuild -C option is used.



  1. FORMAT attribute

We specify  this attribute to indicate the format of the input of output file.

Format
Meaning
Format = 'Binary'
Record contains a two-byte integer data length (n) followed by n bytes of data.
Format = 'Formatted'
Each record is in a format traditionally known as FastLoad or
Teradata format, which is a two-byte integer (n) followed by n bytes of data, followed by an end-of-record marker (X'0A' or X'0D0A).
Format = 'Text'
Each record is entirely character data, an arbitrary number of bytes
followed by one of the following end-of-record markers:
• A single-byte line feed (X'0A') on UNIX platforms
• A double-byte carriage-return/line-feed pair (X'0D0A') on Windows platforms
On windows the line feed is  2 bytes.

When we specify Text all the column definitions in the input/output file must be defined as CHAR.
Format = 'Delimited'
Each record is in variable-length text record format, but they
contain fields (columns) separated by one or more delimiter characters, as defined with the TextDelimiter attribute.

With this file format, all of the data types in the DEFINE SCHEMA must be VARCHARs.

Default TextDelimiter attribute value is pipe character (|)
Format = 'Unformatted'
The data does not conform to any predefined format. Instead,
the data is entirely described by the columns in the schema definition of the
DataConnector operator.



  1. OPENMODE

Attribute that specifies the read/write access mode. Read means read-only access. Write means
write-only access. If a mode value is not specified for OpenMode, it defaults to Read for a
producer instance and Write for a consumer instance.


  1. TEXTDEMILTER and EscapeTextDelimiter attribute

Indicates the delimiter in the input or output file.
Default is '|' . This mean if we don’t specify this attribute then it is assumed as |.
Delimiters can be multiple bytes
To embed a delimiter, precede the delimiter with a backslash ( \ ) or escape.
Use the EscapeTextDelimiter attribute to change the default escape delimiter to something other than the back slash
character ( \).

  1. PRIVATELOGNAME

Optional attribute that specifies the name of a log that is maintained by the Teradata PT Logger inside the public log.

  1. AccessModuleName

Optional attribute that specifies the name of the access module file.

  1. AccessModuleInitStr

Optional attribute that specifies the initialization string for the specified access mode.

  1. MultipleReaders

Use the MultipleReaders attribute to process a single, very large file with multiple instances of
the Data Connector operator.


  1. SkipRows and SkipRowsEveryFile

The SkipRows attribute expects an integer that indicates the number of rows to be skipped.

The SkipRowsEveryFile attribute expects values as Y[es] and N[o].

For example, if SkipRows = 5 and SkipRowsEveryFile = ‘Y’, the system skips the first five rows
in every file and begins processing each file at the sixth row. You might use this method to
bypass header rows that appear at the beginning of each file.


1 comment:

  1. What if SkipRows = 1 and SkipRowsEveryFile = ‘N’ (There are more than 1 file). How will the loader know, for which file the row is to be skipped?

    ReplyDelete