Sunday, 23 June 2013

Teradata 13 Feature - NO PRIMARY INDEX TABLES (NOPI)

  • A NOPI table is simply a table without Primary Index.
     

  • NOPI tables is a 13.0 Feature


  • When new rows are added to a NOPI  table , rows are always appended to the end of the table. They are not added in the middle of a hash sequence. Thus there is no over head of organizing / sorting rows by row hash.

Thus Note that with NOPI tables we don’t need to sort data on AMP's by Row hash.
(Having said this the data on AMP's for NOPI tables will automatically be in sequence of ROWID. This is because of the way uniqueness value is incremented when rows are appended)


  • When rows are appended to the end of the table the rows will have unique Row id. The uniqueness value is incremented for each row. Thus we have unique row id.


  • Rows are distributed by the random generator which will decide which row will go to which amp. Normally rows are almost equally distributed.


  • NOPI tables will be preferred for Staging tables as there wont be any skewness problems.


  • Uses of a NOPI table:

  1. Instead of creating a table with 1st column as the PI OR an unnatural PI of many columns we can use NOPI tables. This can be for intermediate tables that are used. They get created with 1st column as PI and lead to skewness and performance issues.
    It is possible to set the system defaults to create tables without PI.

  1. As a staging table used in mini batch loading technique. .i.e Fastload data into a staging table and then insert select from staging table to target table.

  1. Can be used as a Sandbox table where data can be inserted until an appropriate indexing method is defined.

  1. Can be used as a log File.


  • Creating a NOPI table:


For creating table with no primary index we specify 'NO PRIMARY INDEX' in the CREATE TABLE query.

CREATE TABLE <table-name> (column1 datatype, column2 datatype,…..)
NO PRIMARY INDEX;



  • Is NOPI table SET or MULTISET table?

NOPI table will always be multiset irrespective of the mode (Teradata or ANSI) mode it is created in.




  • How are UNIQUE and PRIMARY KEY constraints implemented with a NOPI table?

Earlier with a primary index table the UNIQUE OR PRIMARY KEY constraints would be implemented as Unique Primary Index Or Unique Secondary index depending upon whether PI is explicitly defined or not.
However with NOPI table UNIQUE OR PRIMARY KEY constraints would always be implemented as Unique Secondary Indexes.




  • What is the table kind for NOPI tables?

For normal tables the tablekind column in the DBC.TVM is 'T'.
But for NOPI tables the tablekind value is 'O'.

So if we want to look at all the tables in system the filter condition should be 'WHERE TABLEKIND IN ('O','T');

For finding all the NOPI tables following is the query:

SELECT * FROM DBC.TABLES WHERE TABLEKIND = 'O';



  • Can we copy the table definition and data from a normal table, but make the new table as a NOPI table?

Yes. We can do that using the following syntax.

CREATE TABLE Table1 AS Table2 WITH DATA NO PRIMARY INDEX.

All the definition from table2 will be copied to table1 including the secondary indexes.



  • Is it possible to create NOPI table as default?


In earlier releases if the PI is not specified a PI is chosen automatically by the system. However we can choose at the system level whether or not to create NOPI tables by default.

If we choose NOPI tables then by default when we don’t specify the PI, the table would be a NOPI table.

A new DBSControl Flag determines if the PI or NOPI table is created when we don’t specify following in the CREATE TABLE DDL.

  1. PRIMARY INDEX clause.
  1. NO PRIMARY index clause.
  1. PRIMARY KEY OR UNIQUE constraints.

This DBS control field is Field 53 and is named as 'Primary Index Default' .
 
Possible values for this field are:

  1. D --> Teradata Default. Currently Default is same as Option P.
  1. P --> "First Column NUPI". Creates tables with 1st column as NUPI.
  1. N --> "No Primary Index". Creates tables without PI.

However note that if this option is chosen as N, and we create a table without explicit PI but table has UNIQUE or PRIMARY KEY defined , then  UNIQUE and PRIMARY KEY take precedence over the 'N' option and the table is created with a Unique Primary index.

Having said this also note that if we explicitly specify 'NO PRIMARY INDEX' while creating the table, it overrides any PRIMARY KEY or UNIQUE constraints. They get implemented as USI and the table is a NOPI table.







  • How is NOPI table implemented?


The rows are distributed between AMP's using the Random Generator. With in a AMP rows are just added in a sequential order.
The Random generator is designed in such a way to balance out the rows between amps.

Normal tables have 64 bit ROWID (32 bit uniqueness value + 32 bit row hash).
NOPI tables will also have 64 bit ROWIDs ,but they are different than the normal table ROWIDs.

1st part of the ROWID is based on the hash bucket value(Hash bucket value can be 16 bit or 20 bit. With TD 12 and forward the typical hash bucket size will be 20 bits and that means over 1 mill hash bucket entries.) that is internally generated and controlled by AMP.
Typically All rows on the AMP will have same hash bucket value , but will have different uniqueness values.


There are two steps involved:

Step1 : The Random generator chooses the hash bucket which in turn determines the AMP to which the row is sent to.

Step 2: Internally each AMP selects a Hash bucket value that the AMP owns and uses  it as the 1st part of the row id (16 or 20 bits).




  1. How does each AMP receive the row. How does Random generator decide which AMP the row  should go to.

  1. The Random generator can be executed at the PE or at the AMP level depending on the type of request. .ie (SQL or Fast load)

  1. For  SQL based functions(Tpump and ad hoc queries) , the PE uses the random generator.
    The DBQL Query ID is used by the random generator to select the hash value.
     
  1. For Fast Load based functions, AMP uses the random generator to get the hash bucket value.
    In this case the AMP Vproc number + a counter is used as input to hash function to generate a random hash bucket value. This hash bucket value indicates the AMP to which the row will go to.



The remaining 44 bits is the uniqueness value. If the hash bucket is just 16 bits then the uniqueness value is 48 bits.

Thus  1st 20 bits(16 bits) --> Hash bucket value that is internally chosen and controlled by AMP's
Next  44 bits (or 48 bits) --> uniqueness value that starts with 1 and is sequentially incremented. With 44 bits there are approximately 17 trillion rows on a AMP.


Just as normal rows each row will have a row id prefix and rows will maintained in ROWID sequence.


Each row id is unique with in the table and with tableid the row becomes unique in the system and thus can be uniquely identified. Thus we can have secondary indexes and Join indexes defined on the table.

Fallback and index maintenance works same as if table is a primary index table.






  • Loading NOPI tables:

NOPI tables are generally used as Staging tables.
Loading a NOPI staging table is faster than loading a similar table with PI.

Fastload:
Larger data blocks are used to perform redistribution (4kb vs 64kb).

Also the sort step is completely eliminated.

In the acquisition phase we don’t have to redistribute rows, instead we redistribute blocks itself.

There is no data distribution for each individual row. This is beneficial for both fastload and Tpump array insert.

However there are certain considerations that need to be taken in mind. When using mini batch loading the data is fastloaded in to the staging table and then loaded into the target table using SQL INSERT SELECT to apply data from staging table to the target table.

Without PI the data can be loaded into the staging table faster. BUT when doing the insert- select the NOPI rows would have to be redistributed and merged into the target table. This may (not always) negate the time advantage we got with faster loads to the staging table.





  • How does FastLoad work for NOPI tables?

FastLoad works differently for NOPI.

  1. Acquisition phase:

    With normal tables the fastload has the break the blocks into individual rows and then hash them.
    With NOPI table the fastload does not have to break the blocks in individual rows. We would redistribute blocks between the amps.

Since the number of fastload sessions is less than the number of AMPs in the system , there will be some AMP's that do not have a session (deblocker) task. These AMP's will not receive any data from Fastload.

Therefore to avoid skewing each block of data received by the deblocker will be handed off to a randomly selected AMP before rows are inserted into the target.


Fastload being a client utility tries to group as many possible rows in blocks (64 k blocks).




  1. Endloading and Sort phase:

  • There is no sort phase involved. With normal tables the rows have to be sorted by the rowhash value.

  • For PI tables the duplicates are removed in the application phase. But with NOPI tables the duplicate rows are not discarded.

  • With PI tables the error table 2 is used to record UPI violations. This error table is not used by the NOPI tables. But it gets created and dropped at the end.






  • Accessing Table while NOPI table is being loaded:

With normal PI tables we cannot view the table data when its being loaded. This is because the sorting of the data has to happen and this does not happen till the end of the Phase 2. Thus the PI table is inaccessible during this time.

However with NOPI tables we can access the table contents while the NOPI table is being fast loaded. This is possible because  rows are always appended to the end of the NOPI table. There is no sort step involved.

Thus if the NOPI table was loaded without including the END LOADING statement following SQL will still work.

LOCKING ROW FOR ACCESS SELECT * FROM TABLE;




  • Accessing NOPI tables:

Typically we use NOPI tables as staging tables and therefore INSERT-SELECT, MERGE INTO or UPDATE FROM commands will be used to apply data from the staging table to the target PI table.

As with normal PI tables we can use SELECT to select data from the table . Without the PI this will be a full table scan unless we have secondary indexes created on the NOPI tables.
If secondary indexes are defined the access paths will work the same way as the PI table.

DELETE ALL (unconstrained delete) on a NOPI table will take the fastpath DELETE as it is done today on a PI table.
A constrained delete  (with a where clause ) will go thru full table scan on a NOPI table unless the secondary index access path in chosen.





  • Skewed NOPI table:

When we INSERT - SELECT from a NOPI table to a PI table , data distribution takes place.

But when we INSERT- SELECT from a PI table to a NOPI table the data from the source table will not be distributed and will be locally appended to the target table.

INSERT-SELECT to a target table can lead to a skewed NOPI table if the table is skewed.




  • Options available with NOPI:

  • NOPI tables can have FALLBACK and we can use ALTER statement to Add or Remove Fallback.
     
  • Secondary indexes (USI and NUSI) are allowed.
     
  • Join and reference indexes are allowed.
     
  • Primary key and Foreign Key constraints are allowed.
     
  • BLOBs and CLOBs are allowed.
     
  • INSERT and DELETE trigger options are available on a NOPI table. Update options will be available starting with TD 13.00.00.03.

  • NOPI tables can be Global Temporary table or a Volatile table.

  • COLLECT/DROP stats is allowed.



  • Limitations available with NOPI:

  • SET tables are not allowed. No duplicate row check happens purpose is to append data as fast as possible.

  • PPI tables are not allowed. NO PI, So question of having PPI

  • Permanent journaling is not allowed.

  • Identity column is no allowed.

  • Queue table or Error table is not allowed.

  • Hash indexes are not allowed (Secondary and Join indexes are allowed).

  • UPDATE, MERGE INTO or UPSERT using NOPI tables as the target is not allowed. UPDATE will be available in the Teradata 13.00.00.03





Very imp : Multiload is not supported for NOPI table as multiload makes use of PI for its operation.

Fast export, Archive runs exactly the same as that of the PI table.

Restoring or copying the table to a system with same number of AMP's works similar to PI table.
However a NOPI table can be skewed after Restore/copy if the target system has different configuration.

5 comments:

  1. Fantastic explanation..keep posting..

    ReplyDelete
  2. Excellent presentation...One doubt, when using fastload if rows are distributed by blocks and not as rows as explained above..how will be the rowhash generated by random generator? will it be for each row or to the entire data block? If the data is distributed by blocks how will INSERT-SELECT using SQL INSERT/SELECT possible(as it operates on row by row insertion principle)? I mean what will be the criteria to qualify a row from an amp?

    ReplyDelete
  3. How to avoid skew factor in a nopi table when loading it with large data? Is skew factor have an impact over the performance of the query such as elapsed time, CPU usage and ios

    ReplyDelete
  4. How to avoid skew factor in a nopi table when loading it with large data? Is skew factor have an impact over the performance of the query such as elapsed time, CPU usage and ios

    ReplyDelete