- A NOPI table is simply a table without
Primary Index.
- NOPI tables is a 13.0 Feature
- When new rows are added to a NOPI table , rows are always appended to the end of the table. They are not added in the middle of a hash sequence. Thus there is no over head of organizing / sorting rows by row hash.
Thus Note that with NOPI tables we don’t need to sort data on
AMP's by Row hash.
(Having said this the data on AMP's for NOPI tables will
automatically be in sequence of ROWID. This is because of the way uniqueness
value is incremented when rows are appended)
- When rows are appended to the end of the table the rows will have unique Row id. The uniqueness value is incremented for each row. Thus we have unique row id.
- Rows are distributed by the random generator which will decide which row will go to which amp. Normally rows are almost equally distributed.
- NOPI tables will be preferred for Staging tables as there wont be any skewness problems.
- Uses of a NOPI table:
- Instead
of creating a table with 1st column as the PI OR an unnatural PI of many
columns we can use NOPI tables. This can be for intermediate tables that
are used. They get created with 1st column as PI and lead to skewness and
performance issues.
It is possible to set the system defaults to create tables without PI.
- As a staging table used in mini batch loading technique. .i.e Fastload data into a staging table and then insert select from staging table to target table.
- Can be used as a Sandbox table where data can be inserted until an appropriate indexing method is defined.
- Can be used as a log File.
- Creating a NOPI table:
For creating table with no primary index we specify 'NO PRIMARY
INDEX' in the CREATE TABLE query.
CREATE TABLE <table-name> (column1 datatype, column2
datatype,…..)
NO PRIMARY INDEX;
- Is NOPI table SET or MULTISET table?
NOPI table will always be multiset irrespective of the mode
(Teradata or ANSI) mode it is created in.
- How are UNIQUE and PRIMARY KEY constraints implemented with a NOPI table?
Earlier with a primary index table the UNIQUE OR PRIMARY KEY
constraints would be implemented as Unique Primary Index Or Unique Secondary
index depending upon whether PI is explicitly defined or not.
However with NOPI table UNIQUE OR PRIMARY KEY constraints would always be implemented as Unique Secondary
Indexes.
- What is the table kind for NOPI tables?
For normal tables the tablekind column in the DBC.TVM is 'T'.
But for NOPI tables the tablekind value is 'O'.
So if we want to look at all the tables in system the filter
condition should be 'WHERE TABLEKIND IN ('O','T');
For finding all the NOPI tables following is the query:
SELECT * FROM DBC.TABLES WHERE TABLEKIND = 'O';
- Can we copy the table definition and data from a normal table, but make the new table as a NOPI table?
Yes. We can do that using the following syntax.
CREATE TABLE Table1 AS Table2 WITH DATA NO PRIMARY INDEX.
All the definition from table2 will be copied to table1
including the secondary indexes.
- Is it possible to create NOPI table as default?
In earlier releases if the PI is not specified a PI is chosen
automatically by the system. However we can choose at the system level whether
or not to create NOPI tables by default.
If we choose NOPI tables then by default when we don’t specify
the PI, the table would be a NOPI table.
A new DBSControl Flag
determines if the PI or NOPI table is created when
we don’t specify following in the CREATE TABLE DDL.
- PRIMARY INDEX clause.
- NO PRIMARY index clause.
- PRIMARY KEY OR UNIQUE constraints.
This DBS control field is Field
53 and is named as 'Primary Index
Default' .
Possible values for this field are:
- D --> Teradata Default. Currently Default is same as Option P.
- P --> "First Column NUPI". Creates tables with 1st column as NUPI.
- N --> "No Primary Index". Creates tables without PI.
However note that if this option is chosen as N, and we create a table
without explicit PI but table has UNIQUE or PRIMARY KEY defined , then UNIQUE and PRIMARY KEY take precedence over
the 'N' option and the table is created with a Unique Primary index.
Having said this also note that if we explicitly specify 'NO
PRIMARY INDEX' while creating the table, it overrides any PRIMARY KEY or UNIQUE
constraints. They get implemented as USI and the table is a NOPI table.
- How is NOPI table implemented?
The rows are distributed between AMP's using the Random Generator. With in a AMP rows are just
added in a sequential order.
The Random generator is designed in such a way to balance out
the rows between amps.
Normal tables have 64 bit ROWID (32 bit uniqueness value + 32
bit row hash).
NOPI tables will also have 64 bit ROWIDs ,but they are different than the normal table ROWIDs.
NOPI tables will also have 64 bit ROWIDs ,but they are different than the normal table ROWIDs.
1st part of the ROWID is based on the hash bucket value(Hash
bucket value can be 16 bit or 20 bit. With TD 12 and forward the typical hash
bucket size will be 20 bits and that means over 1 mill hash bucket entries.)
that is internally generated and controlled by AMP.
Typically All rows on the AMP will have same hash bucket value , but
will have different uniqueness values.
There are two steps involved:
Step1 : The Random
generator chooses the hash bucket which in turn determines the AMP to which the
row is sent to.
Step 2: Internally each
AMP selects
a Hash bucket value that the AMP owns and uses
it as the 1st part of the row id (16 or 20 bits).
- How does each AMP receive the row. How does Random generator decide which AMP the row should go to.
- The Random generator can be executed at the PE or at the AMP level depending on the type of request. .ie (SQL or Fast load)
- For SQL based functions(Tpump and ad
hoc queries)
, the PE uses the random
generator.
The DBQL Query ID is used by the random generator to select the hash value.
- For
Fast Load based functions, AMP uses the random generator to get
the hash bucket value.
In this case the AMP Vproc number + a counter is used as input to hash function to generate a random hash bucket value. This hash bucket value indicates the AMP to which the row will go to.
The remaining 44 bits is the
uniqueness value. If the hash bucket is just 16 bits then the uniqueness
value is 48 bits.
Thus 1st 20 bits(16 bits) --> Hash bucket value
that is internally chosen and controlled by AMP's
Next 44 bits (or 48 bits) --> uniqueness value
that starts with 1 and is sequentially incremented. With 44 bits there are
approximately 17 trillion rows on a AMP.
Just as normal rows each row will have a row id prefix and rows
will maintained in ROWID sequence.
Each row id is unique with in the table and with tableid the row
becomes unique in the system and thus can be uniquely identified. Thus we can
have secondary indexes and Join indexes defined on the table.
Fallback and index maintenance works same as if table is a primary
index table.
- Loading NOPI tables:
NOPI tables are generally used as Staging tables.
Loading a NOPI staging table is faster than loading a similar
table with PI.
Fastload:
Larger data blocks are used to perform redistribution (4kb vs
64kb).
Also the sort step is completely eliminated.
In the acquisition phase we don’t have to redistribute rows, instead we redistribute blocks itself.
There is no data distribution for each individual row. This is beneficial for both fastload and Tpump array insert.
However there are certain considerations that need to be taken
in mind. When using mini batch loading the data is fastloaded in to the staging
table and then loaded into the target table using SQL INSERT SELECT to apply
data from staging table to the target table.
Without PI the data can be loaded
into the staging table faster. BUT when doing the insert- select the NOPI rows
would have to be redistributed and merged into the target table. This may (not
always) negate the time advantage we got with faster loads to the staging
table.
- How does FastLoad work for NOPI tables?
FastLoad works differently for NOPI.
- Acquisition phase:
With normal tables the fastload has the break the blocks into individual rows and then hash them. With NOPI table the fastload does not have to break the blocks in individual rows. We would redistribute blocks between the amps.
Since the number of fastload sessions is less than the number of
AMPs in the system , there will be some AMP's that do not have a session
(deblocker) task. These AMP's will not receive any data from Fastload.
Therefore to avoid skewing each block of data received by the
deblocker will be handed off to a randomly selected AMP before rows are
inserted into the target.
Fastload being a client utility tries to group as many possible
rows in blocks (64 k blocks).
- Endloading and Sort phase:
- There is no sort phase involved. With normal tables the rows have to be sorted by the rowhash value.
- For PI tables the duplicates are removed in the application phase. But with NOPI tables the duplicate rows are not discarded.
- With PI tables the error table 2 is used to record UPI violations. This error table is not used by the NOPI tables. But it gets created and dropped at the end.
- Accessing Table while NOPI table is being loaded:
With normal PI tables we cannot view the table data when its
being loaded. This is because the sorting of the
data has to happen and this does not happen till the end of the Phase 2. Thus
the PI table is inaccessible during this time.
However with NOPI tables we can
access the table contents while the NOPI table is being fast loaded.
This is possible because rows are always
appended to the end of the NOPI table. There is no sort step involved.
Thus if the NOPI table was loaded without including the END
LOADING statement following SQL will still work.
LOCKING ROW FOR ACCESS SELECT * FROM TABLE;
- Accessing NOPI tables:
Typically we use NOPI tables as staging tables and therefore
INSERT-SELECT, MERGE INTO or UPDATE FROM commands will be used to apply data
from the staging table to the target PI table.
As with normal PI tables we can use SELECT to select data from
the table . Without the PI this will be a full
table scan unless we have secondary indexes created on the NOPI tables.
If secondary indexes are defined
the access paths will work the same way as the PI table.
DELETE ALL (unconstrained delete) on a NOPI table will take the
fastpath DELETE as it is done today on a PI table.
A constrained delete (with a where clause ) will go thru full
table scan on a NOPI table unless the secondary index access path in chosen.
- Skewed NOPI table:
When we INSERT - SELECT from a NOPI table to a PI table , data
distribution takes place.
But when we INSERT- SELECT from a
PI table to a NOPI table the data from the source table will not be distributed
and will be locally appended to the target table.
INSERT-SELECT to a target table
can lead to a skewed NOPI table if the table is skewed.
- Options available with NOPI:
- NOPI tables
can have FALLBACK and we can use ALTER statement to Add or Remove
Fallback.
- Secondary
indexes (USI and NUSI) are allowed.
- Join and reference indexes are
allowed.
- Primary key and Foreign Key
constraints are allowed.
- BLOBs and CLOBs are
allowed.
- INSERT and DELETE trigger options are available on a NOPI table. Update options will be available starting with TD 13.00.00.03.
- NOPI tables can be Global Temporary table or a Volatile table.
- COLLECT/DROP stats is allowed.
- Limitations available with NOPI:
- SET tables are not allowed. No duplicate row check happens purpose is to append data as fast as possible.
- PPI tables are not allowed. NO PI, So question of having PPI
- Permanent journaling is not allowed.
- Identity column is no allowed.
- Queue table or Error table is not allowed.
- Hash indexes are not allowed (Secondary and Join indexes are allowed).
- UPDATE, MERGE INTO or UPSERT using NOPI tables as the target is not allowed. UPDATE will be available in the Teradata 13.00.00.03
Very imp :
Multiload is not supported for NOPI table as multiload makes use of PI for its
operation.
Fast
export, Archive runs exactly the same as that of the PI table.
Restoring
or copying the table to a system with same number of AMP's works similar to PI
table.
However
a NOPI table can be skewed after Restore/copy if the target system has
different configuration.
Fantastic explanation..keep posting..
ReplyDeleteExcellent presentation...One doubt, when using fastload if rows are distributed by blocks and not as rows as explained above..how will be the rowhash generated by random generator? will it be for each row or to the entire data block? If the data is distributed by blocks how will INSERT-SELECT using SQL INSERT/SELECT possible(as it operates on row by row insertion principle)? I mean what will be the criteria to qualify a row from an amp?
ReplyDeletePERFECT!!
ReplyDeleteHow to avoid skew factor in a nopi table when loading it with large data? Is skew factor have an impact over the performance of the query such as elapsed time, CPU usage and ios
ReplyDeleteHow to avoid skew factor in a nopi table when loading it with large data? Is skew factor have an impact over the performance of the query such as elapsed time, CPU usage and ios
ReplyDelete