Sunday 17 March 2013

Indexes - Part 4 - JOIN indexes


JOIN Index:

  • JOIN index is a technique that is used to improve the performance of certain types of queries.


  • With NUSI the columns involved in the index can belong only to one table. With Join index the index structure contains columns from one or more tables.

 
  • Note that once this is created , it available only to optimizer. It’s the optimizer who decides whether to use join index or not. This index can never we directly accessed by the user.

 
  • JOIN index helps in joining tables by providing the data needed by using index itself and also by avoiding redistribution of data in many cases.

 
  • Following are the types of JOIN indexes:

  1. Multiple table Join index: This type of index is used to pre-join the tables, which can help prevent redistribution of data

  1. Single table Join index: This type of Join index is used to rehash and redistribute the rows of a single table based on specified columns

  1. Aggregate Join index: Aggregate join index is used to create summary table.

  • Example of Creating a multi table join index


CREATE JOIN INDEX JNINDEX
AS SELECT
departmentname,
d.DEPARTMENTNO,
employeeid,
salary,
hiredate
FROM
department d
Join
employee e
on d.departmentno=e.departmentno
primary index (departmentno);

Above query can also be written as

CREATE JOIN INDEX JNINDEX
AS SELECT
(departmentname,
d.DEPARTMENTNO),

(employeeid,
salary,
Hiredate)

FROM
department d
Join
employee e
on d.departmentno=e.departmentno
primary index (departmentno);

Functionality wise both the syntax mean the same , however in the 2nd syntax join index is comprised of a 'fixed' portion (first parenthesis) and a 'repeatable' portion (second parenthesis). This represents a denormalization of the data. Difference is internally.


Explain of the below query shows that the join index was used to cover the query , and hence result will be generated without accessing the base rows.

Explain SELECT departmentname,employeeid,salary
from department d Join employee e
on d.departmentno=e.departmentno

  1) First, we lock a distinct EDW_RESTORE_TABLES."pseudo table" for
     read on a RowHash to prevent global deadlock for
     EDW_RESTORE_TABLES.JNINDEX.
  2) Next, we lock EDW_RESTORE_TABLES.JNINDEX for read.
  3) We do an all-AMPs RETRIEVE step from EDW_RESTORE_TABLES.JNINDEX by
     way of an all-rows scan with no residual conditions into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with low confidence to be 1,440 rows (
     139,680 bytes).  The estimated time for this step is 0.01 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.01 seconds.


  • JOIN index once created we don’t need to maintain the index , RDBMS does that automatically which means that when the base rows change the join index is also changed automatically.
     

  • Note that when creating the JOIN index we have specified  a primary index. The primary index gets assigned irrespective of whether we explicitly specify one or not. Primary index is used to redistribute the index rows across the AMP's.

note : JOIN index can only have a NUPI and never UPI

  • The index rows on the  AMP's are sequenced by the hash value of the primary index of the join index. However this type of sequencing is not beneficial in range processing. Hence we have a option to use a ORDER BY clause to override the default sequencing.

Syntax:

CREATE JOIN INDEX JNINDEX
AS SELECT
departmentname,
d.DEPARTMENTNO,
employeeid,
salary,
hiredate
FROM
department d
Join
employee e
on d.departmentno=e.departmentno
ORDER BY HIREDATE
primary index (departmentno);

The rules for the column in ORDER BY are  similar to those of the value ordered NUSI's

  1. Single Column
     
  1. The Column used for ordering should be  part of or all of the fixed portion index definition.
     
  1. The column must be numeric column.
     
  1. Column should not be greater then 4 bytes in length - INT, SMALLINT, BYTEINT, DATE, DEC with storage length no greater than 4 bytes are valid.


  • JOIN indexes with NUSI defined.

    NUSI can be defined on a Join index to improve performance. We earlier saw that index  rows can be ordered by a particular column to facilitate range processing . But what if there are multiple columns that are used for range processing.

This can be solved by making use of join index and value ordering it.

NUSI's on Join Indexes can be built
  1. As part of the CREATE JOIN INDEX statement,
or
  1. they can be added after Join Index creation using the CREATE INDEX statement.

Example:

CREATE JOIN INDEX JNINDEX
AS SELECT
departmentname,
d.DEPARTMENTNO,
employeeid,
salary,
hiredate
FROM
department d
Join
employee e
on d.departmentno=e.departmentno
primary index (departmentno)
INDEX(hiredate) ORDER BY VALUES(hiredate);

Or

CREATE INDEX(hiredate) ORDER BY VALUES (hiredate) on JNINDEX;

  • Example of Single table indexes


Basic purpose single table Join indexes is to rehash and redistribute the rows of the table by column other than the primary index.

Assume a scenario where we join a two table and one of the two table needs to get distributed on the join column so that join can be performed. This would be time consuming if the table is very huge. However we can create a single table join index on this table with the column used for redistribution as the primary index of the join index. Thus rows will be pre-distributed and hence there wont be any re distribution while performing the join and thus will speed up the join.

CREATE JOIN INDEX emp_deptno 
AS SELECT employee_number, department_number 
FROM employee 
PRIMARY INDEX (department_number)
;

No comments:

Post a Comment