Teradata FAQ's: Indexes - Part 4

JOIN Index:

JOIN index is a technique that is used to improve the performance of certain types of queries.

With NUSI the columns involved in the index can belong only to one table. With Join index the index structure contains columns from one or more tables.

Note that once this is created , it available only to optimizer. It’s the optimizer who decides whether to use join index or not. This index can never we directly accessed by the user.

JOIN index helps in joining tables by providing the data needed by using index itself and also by avoiding redistribution of data in many cases.

Following are the types of JOIN indexes:

Multiple table Join index: This type of index is used to pre-join the tables, which can help prevent redistribution of data

Single table Join index: This type of Join index is used to rehash and redistribute the rows of a single table based on specified columns

Aggregate Join index: Aggregate join index is used to create summary table.

Example of Creating a multi table join index

CREATE JOIN INDEX JNINDEX

AS SELECT 

departmentname,

d.DEPARTMENTNO,

employeeid,

salary,

hiredate

FROM 

department d 

Join 

employee e

on d.departmentno=e.departmentno

primary index
(departmentno);

Above query can also be written as

CREATE JOIN INDEX JNINDEX

AS SELECT 

(departmentname,

d.DEPARTMENTNO),

(employeeid,

salary,

Hiredate)

FROM 

department d 

Join 

employee e

on d.departmentno=e.departmentno

primary index
(departmentno);

Functionality wise both the syntax mean the same , however in the 2nd syntax join index is comprised of a 'fixed' portion (first parenthesis) and a 'repeatable' portion (second parenthesis). This represents a denormalization of the data. Difference is internally.

Explain of the below query shows that the join index was used to cover the query , and hence result will be generated without accessing the base rows.

Explain SELECT departmentname,employeeid,salary

from department d Join employee e

on d.departmentno=e.departmentno

  1) First, we lock
a distinct EDW_RESTORE_TABLES."pseudo table" for

     read on a
RowHash to prevent global deadlock for

    
EDW_RESTORE_TABLES.JNINDEX. 

  2) Next, we lock
EDW_RESTORE_TABLES.JNINDEX for read. 

  3) We do an
all-AMPs RETRIEVE step from EDW_RESTORE_TABLES.JNINDEX by

     way of an
all-rows scan with no residual conditions into Spool 1

     (group_amps),
which is built locally on the AMPs.  The
size of

     Spool 1 is
estimated with low confidence to be 1,440 rows (

     139,680
bytes).  The estimated time for this step
is 0.01 seconds. 

  4) Finally, we
send out an END TRANSACTION step to all AMPs involved

     in processing
the request.

  -> The contents
of Spool 1 are sent back to the user as the result of

     statement
1.  The total estimated time is 0.01
seconds. 

JOIN index once created we don’t need to maintain the index , RDBMS does that automatically which means that when the base rows change the join index is also changed automatically.

Note that when creating the JOIN index we have specified a primary index. The primary index gets assigned irrespective of whether we explicitly specify one or not. Primary index is used to redistribute the index rows across the AMP's.

note : JOIN index can only have a NUPI and never UPI

The index rows on the AMP's are sequenced by the hash value of the primary index of the join index. However this type of sequencing is not beneficial in range processing. Hence we have a option to use a ORDER BY clause to override the default sequencing.

Syntax:

CREATE JOIN INDEX JNINDEX

AS SELECT 

departmentname,

d.DEPARTMENTNO,

employeeid,

salary,

hiredate

FROM 

department d 

Join 

employee e

on d.departmentno=e.departmentno

ORDER BY HIREDATE

primary index
(departmentno);

The rules for the column in ORDER BY are similar to those of the value ordered NUSI's

Single Column

The Column used for ordering should be part of or all of the fixed portion index definition.

The column must be numeric column.

Column should not be greater then 4 bytes in length - INT, SMALLINT, BYTEINT, DATE, DEC with storage length no greater than 4 bytes are valid.

JOIN indexes with NUSI defined.

NUSI can be defined on a Join index to improve performance. We earlier saw that index rows can be ordered by a particular column to facilitate range processing . But what if there are multiple columns that are used for range processing.

This can be solved by making use of join index and value ordering it.

NUSI's on Join Indexes can be built

As part of the CREATE JOIN INDEX statement,

they can be added after Join Index creation using the CREATE INDEX statement.

Example:

CREATE JOIN INDEX JNINDEX

AS SELECT

departmentname,

d.DEPARTMENTNO,

employeeid,

salary,

hiredate

FROM

department d

Join

employee e

on d.departmentno=e.departmentno

primary index (departmentno)

INDEX(hiredate) ORDER BY VALUES(hiredate);

CREATE INDEX(hiredate) ORDER BY VALUES (hiredate) on JNINDEX;

Example of Single table indexes

Basic purpose single table Join indexes is to rehash and redistribute the rows of the table by column other than the primary index.

Assume a scenario where we join a two table and one of the two table needs to get distributed on the join column so that join can be performed. This would be time consuming if the table is very huge. However we can create a single table join index on this table with the column used for redistribution as the primary index of the join index. Thus rows will be pre-distributed and hence there wont be any re distribution while performing the join and thus will speed up the join.

CREATE JOIN INDEX emp_deptno

AS SELECT employee_number, department_number

FROM employee

PRIMARY INDEX (department_number)

;

Teradata FAQ's

links

Blog Archive

Sunday, 17 March 2013

Indexes - Part 4 - JOIN indexes

No comments:

Post a Comment