Tuesday 14 May 2013

Teradata - Auto Generated Key retrieval feature

Auto Generated Key retrieval:

In all earlier examples there is no way to determine the value assigned to the identity column for the inserted row in the table.
The user would have to query the table to find that out.
However with the Auto-generated Key retrieval feature we would get to know the assigned identity values.

To enable this feature we use the command as follows from BTEQ.

.[SET] AUTOKEYRETRIEVE [OFF|COLUMN|ROW]

Where:
• OFF = Disabled. Same that earlier
• COLUMN = Will display only the IDCol.
• ROW = Enabled, will display entire row

Note that This is a BTEQ command and hence we cannot run this through SQL assistant.


Example:

Assume a table as shown below :

CREATE SET TABLE EDW_RESTORE_TABLES.EMPLOYEE4 ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      EMPLOYEEID INTEGER GENERATED ALWAYS AS IDENTITY
           (START WITH 1
            INCREMENT BY 1
            MINVALUE 1
            MAXVALUE 4
            CYCLE),
      EMPLNAME CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( EMPLOYEEID );

/* Next is a simple insert. BY default the AUTOKEYRETRIEVE feature is OFF and hence it does not return back the identity column value */
 BTEQ -- Enter your DBC/SQL request or BTEQ command:
INSERT INTO EMPLOYEE4 VALUES(,'sukul');

INSERT INTO EMPLOYEE4 VALUES(,'sukul');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

/* Next we turn on the AUTORETRIEVE COLUMN feature*/
 BTEQ -- Enter your DBC/SQL request or BTEQ command:
.SET AUTOKEYRETRIEVE COLUMN;

.SET AUTOKEYRETRIEVE COLUMN;
 BTEQ -- Enter your DBC/SQL request or BTEQ command:
INSERT INTO EMPLOYEE4 VALUES(NULL,'shushant');

INSERT INTO EMPLOYEE4 VALUES(NULL,'shushant');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 EMPLOYEEID
-----------
          2

/* As shown above because the AUTORETRIEVE COLUMN feature was turned on only the value of identity column was returned */

/* Next we turn on the AUTORETRIEVE ROW feature*/
 BTEQ -- Enter your DBC/SQL request or BTEQ command:
.SET AUTOKEYRETRIEVE ROW;

.SET AUTOKEYRETRIEVE ROW;
 BTEQ -- Enter your DBC/SQL request or BTEQ command:
INSERT INTO EMPLOYEE4 VALUES(,'bhanu');

INSERT INTO EMPLOYEE4 VALUES(,'bhanu');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 EMPLOYEEID  EMPLNAME
-----------  --------------------
          3  bhanu

/* As shown above because the AUTORETRIEVE ROW feature was turned on the entire row was returned */





Considerations while using Auto generated key feature:

  • Inserts will have additional costs when auto generated key retrieval feature is requested.

  • This Feature works with a single INSERT or INSERT-SELECT, but not with upsert, merge into, multiload, fastload etc

  • Iterated INSERTs have to adhere to the 2048 spool limit of the Array Support feature. A max of 1024 iterations is possible as each iteration uses an AGKR spool and a response spool.

Teradata - Generated Identity Columns - Rules



Rules for using Identity columns:

  1. Identity column(IC) is considered as column attribute. We cannot drop or modify the IC attribute of the column. But we can drop the IC column from the table.

  1. An IC column cannot exist along with following attributes:
     
    • DEFAULT
    • BETWEEN
    • COMPRESS
    • CHECK
    • REFERENCES

  1. There can be only one IC column


  1. IC column cannot be a part of any of the following:

  • Composite indexes (primary or secondary)
  • Hash or Join Indexes
  • Partitioned Primary Indexes
  • Value-ordered Indexes

Note that IC column cannot be  apart of Composite index, but it can be the index as a whole.

Following statement is not allowed:

CREATE        TABLE EMPLOYEE4, NO FALLBACK
(
EMPLOYEEID INTEGER GENERATED BY DEFAULT AS IDENTITY (MAXVALUE 4 CYCLE MINVALUE 1),
EMPLNAME CHAR(20)
) PRIMARY INDEX (EMPLOYEEID,EMPLNAME); --> composite PI with IC column as part of the PI

This would fail with the error code: CREATE TABLE Failed. 5784:  Illegal usage of Identity Column EMPLOYEEID. 

However following is allowed:

CREATE        TABLE EMPLOYEE4, NO FALLBACK
(
EMPLOYEEID INTEGER GENERATED BY DEFAULT AS IDENTITY (MAXVALUE 4 CYCLE MINVALUE 1),
EMPLNAME CHAR(20)
) PRIMARY INDEX (EMPLOYEEID); --> IC column as PI itself.


  1. Column compression is not allowed on IC columns.

  1. IC are not supported by load utilities like Fast load or multi load.

  1. IC can be used with multisession BTEQ or Multistatement TPUMP

  1. Bulk inserts done via INSERT SELECT also cannot guarantee that the sequence of the assigned IC's will be unbroken. This is because each AMP pre-allocates a range of numbers based on a pre-defined interval (specified in the DBS Control Record). Consequently each AMP will provide its own sequence independently of the others.

  1. Imp: 'CREATE TABLE AS' feature does not work with Identity columns.

In the below example EMPLOYEE4 is a table with IC. We try to create another table EMPLOYEE5 as follows and the query fails

CREATE TABLE EMPLOYEE5 AS EMPLOYEE4 WITH DATA ;

Error Code: CREATE TABLE Failed. 5788:  CREATE TABLE AS currently does not support Identity Columns.  

Teradata - Generated Identity Column BY DEFAULT


Generating Values by DEFAULT:

Earlier we saw the 'GENERATED ALWAYS' option . Now we have a look at the 'GENERATED BY DEFAULT' option which  generate a value only when a value is not explicitly expressed.

CREATE        TABLE EMPLOYEE4, NO FALLBACK
(
EMPLOYEEID INTEGER GENERATED BY DEFAULT AS IDENTITY (MAXVALUE 4 CYCLE MINVALUE 1),
EMPLNAME CHAR(20)
) ;

Unlike earlier new values will be generated only if we don’t specify the value for the IC column explicitly.

INSERT INTO EMPLOYEE4 VALUES(,'sukul');
INSERT INTO EMPLOYEE4 VALUES(NULL,'shushant');
INSERT INTO EMPLOYEE4 VALUES(6,'uma');
INSERT INTO EMPLOYEE4 VALUES(,'bhanu');

EMPLOYEEID
EMPLNAME
2
shushant           
6
uma                
1
sukul              
3
bhanu              

Note that when we provide explicit value for the 'GENERATED BY DEFAULT' column it uses that value and does not generate a value on its own. However providing a value does it override the internal sequence. That’s the reason even when we inserted employee id of 6 , next sequence did not start from 7. instead it continued its own sequence and generated 3.

Also make a note that whenever an insert fails the 'generated' value keeps on incrementing.
If the value generated if 100 when an insert fails, the next successful insert will insert value as 101.

Also just as with 'GENERATED ALWAYS', when using 'GENERATED BY DEFAULT' does not reset the column value on emptying the table.

Teradata - Generated Identity Column ALWAYS - START WITH,MAXVALUE,MINVALUE,INCREMENT BY,CYCLE


Using ALWAYS, MINVALUE, MAXVALUE and CYCLE:


For test purpose we create a following table:

CREATE        TABLE EDW_RESTORE_TABLES.EMPLOYEE4, NO FALLBACK
(
EMPLOYEEID INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 4),
EMPLNAME CHAR(20)
) ;

Then we run following queries:

/* NULL using positional assignment*/
INSERT INTO EMPLOYEE4 VALUES(,'sukul');

/* providing NULL explicitly*/
INSERT INTO EMPLOYEE4 VALUES(NULL,'shushant');

/*passing a value explicitly for identity column. These values will be ignored*/
INSERT INTO EMPLOYEE4 VALUES(4,'uma');
INSERT INTO EMPLOYEE4 VALUES(1,'bhanu');

select * from EmPLOYEE4;

EMPLOYEEID
EMPLNAME
1
sukul              
2
shushant           
3
uma                
4
bhanu              


Note that when we use 'ALWAYS' the explicitly specified values for the identity column are ignored..


Now we try to insert one more row as follows:

INSERT INTO EMPLOYEE4 VALUES(NULL,'himanshu');

This query fails because of following reason: INSERT Failed. 5753:  Numbering for Identity Column EMPLOYEEID is over its limit. 

The reason the max value of 4 is already reached and we cannot insert more rows to the table now.

Things learnt : once the max value is reached we cannot new rows to the table(unless we have CYCLE option specified).

We can try to delete all the existing rows and then try to insert a new row using following queries:

DELETE FROM EMPLOYEE4;
INSERT INTO EMPLOYEE4 VALUES(NULL,'himanshu');

Even after deleting all rows the new insert fails with the same error code:  INSERT Failed. 5753:  Numbering for Identity Column EMPLOYEEID is over its limit. 

Things Learnt: Once Max value is reached (and CYCLE option is not specified) we cannot insert a new row even after deleting all the records from the table. This  means deleting data from table does not reset the max value reached.


--------------------------------------------------

Now we will drop the above table and recreate it using the CYCLE option as follows:

DROP table EMPLOYEE4;
 

CREATE        TABLE EMPLOYEE4, NO FALLBACK
(
EMPLOYEEID INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 4 CYCLE),
EMPLNAME CHAR(20)
) ;


And then make 5 inserts as below:
INSERT INTO EMPLOYEE4 VALUES(,'sukul');
INSERT INTO EMPLOYEE4 VALUES(NULL,'shushant');
INSERT INTO EMPLOYEE4 VALUES(4,'uma');
INSERT INTO EMPLOYEE4 VALUES(1,'bhanu');
INSERT INTO EMPLOYEE4 VALUES(NULL,'himanshu');


SELECT * FROM EMPLOYEE4;

EMPLOYEEID
EMPLNAME
-2147483647
himanshu           
1
sukul              
2
shushant           
3
uma                
4
bhanu              

Note that after we hit the max value of 4 , as we used the CYCLE option it goes to the minimum value.
Since we had not specified the min value it uses the minimum value supported by the data type an hence we get -2147483647 after rolling over from 4.


--------------------------------------

Assume that we have a table defined as follows:

CREATE        TABLE EMPLOYEE4, NO FALLBACK
(
EMPLOYEEID INTEGER GENERATED ALWAYS AS IDENTITY (STArt WITH 100 INCREMENT BY 2 MAXVALUE 104 CYCLE MINVALUE 10),
EMPLNAME CHAR(20)
) ;

And then we make following inserts:

INSERT INTO EMPLOYEE4 VALUES(,'sukul');
INSERT INTO EMPLOYEE4 VALUES(NULL,'shushant');
INSERT INTO EMPLOYEE4 VALUES(4,'uma');
INSERT INTO EMPLOYEE4 VALUES(1,'bhanu');
INSERT INTO EMPLOYEE4 VALUES(NULL,'himanshu');

Following is how the table looks like:

EMPLOYEEID
EMPLNAME
10
bhanu              
12
himanshu           
100
sukul              
102
shushant           
104
uma                

Note that 'Start WITH' indicates the value to begin with. In this case its 100. INCREMENT BY indicates the value by which the value should be increment.  In this case it 2.
Max value is 104 and min value is 10. When the value rolls over 104, it goes to 10(as min value is specified ,it does not go to data type default). Thus note that it can start with a value greater than the min value.


Care needs to be taken that CYCLE option should not create uniqueness violation or duplicate rows in the table. If there is a UPI defined on the generated column only then CYCLE option can be dangerous as it will inserts to  be rejected. Even if the define the PI as NUPI we need to take care that we don’t create exactly identical rows as they will be rejected by SET tables.


---------------------------------------------------------

Assume a table as below:

CREATE        TABLE EMPLOYEE4, NO FALLBACK
(
EMPLOYEEID INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 4 CYCLE MINVALUE 1),
EMPLNAME CHAR(20)
) ;

INSERT INTO EMPLOYEE4 VALUES(,'sukul');
INSERT INTO EMPLOYEE4 VALUES(NULL,'shushant');
INSERT INTO EMPLOYEE4 VALUES(4,'uma');

select * from EmPLOYEE4;

EMPLOYEEID
EMPLNAME
1
sukul              
2
shushant           
3
uma                

Now  we delete a row with employeeid=2

DELETE FROM EMPLOYEE4 where employeeid=2;

And then insert another row as follows:

INSERT INTO EMPLOYEE4 VALUES(1,'bhanu');

Following is how the table will look like :

EMPLOYEEID
EMPLNAME
1
sukul              
3
uma                
4
bhanu              

Thing learnt: Gaps are never filled. The column knows what was the last value and what value to put next.


Now we try to insert another row as follows

INSERT INTO EMPLOYEE4 VALUES(,'sukul');

The insert fails because of following reason: INSERT Failed. 2802:  Duplicate row error in EDW_RESTORE_TABLES.EMPLOYEE4. 

The reason is that the table is a set table and does not allow duplicate rows. Duplicate was generated because value got rolled off and came back to the minimum value of 1. And the table already had row of 1,'sukul'

Now we insert another row as follows:

INSERT INTO EMPLOYEE4 VALUES(NULL,'himanshu');

select * from EmPLOYEE4;

EMPLOYEEID
EMPLNAME
2
himanshu           
4
bhanu              
1
sukul              
3
uma                

Note that new record got the employee-id as 2. The earlier insert had failed, but still the value kept on incrementing. 

Monday 13 May 2013

Teradata - Generated Identity Columns Basic Part 1


Generated Identity Columns are used for automatic generation of column value depending upon prescribed sequencing and intervals.

Following are the Options that can be used with this feature:

GENERATED ALWAYS
Will always generate a value irrespective of whether the value is being specified or  not.


GENERATED BY DEFAULT
Will generate a value only if defaulting or NULL is specified for the column value.


START WITH
Value used to start the generated sequence. Default is 1.

INCREMENT BY
Interval between generated values. Default is 1.

MINVALUE
Smallest value that can be placed in this column. Default is the smallest value supported by the data type. (Ex: For integer data type it would be-2147483647)

MAXVALUE
Largest value that can be placed in this column. Default is the largest value supported by the data type.

CYCLE
After the max value has been reached restart using the min value. Note that it does not go back to the 'START WITH' value. Instead it goes to the min value. If no min value is explicitly specified then it goes the system default min value supported by that data type.
 


Imp Note : Only Numeric data types can be used for identity columns and they may be only whole  numbers:
 
  1. INTEGER
  1. SMALLINT
  1. BYTEINT
  1. DECIMAL
  1. NUMERIC

Teradata Sampling - Randomized sampling


Sampling with randomized allocation:

Sampling with randomized allocation provides ability to sample randomly across the entire system instead of each AMP returning an equal share of the sample.

In most cases, the difference in the data sampled with randomized versus non-randomized sampling will not be immediately apparent.

Following is how we request randomized allocation:

Select        employeeid,departmentno,sampleid
From        employee2
SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION
WHEN        departmentno=100 then 4,2
WHEN        departmentno=300 then 4,4
END        ;

Teradata Sampling-- SAMPLE WITH REPLACEMENT


Sample with Replacement:

Assume a table employee with just 14 rows in it.
Now if we execute the following query we will still get result of only 14 rows.

select        *
from        employee
sample         50;

Even though we have asked for sample of 50 rows , because the table has only 14 rows we will get only 14 rows in return. This is because SAMPLE by default does not allow duplication of the rows. Once the row is returned then it is taken out of the pool of rows.

However if we use the WITH REPLACEMENT option we get ability to sample the row and return it back to the pool of rows from which it came, making it candidate for samplings.
Ex:

select        *
from        employee
sample        with replacement 50;

Above query will return 50 rows even if it has to repeat the same rows again and again.

Note that here we write 'SAMPLE WITH REPLACEMENT 50' and not 'SAMPLE 50 WITH REPLACEMENT'.

We can combine stratified sampling and sampling with replacement as follows:

Ex 1:

Select        employeeid,departmentno,sampleid
From        employee2
SAMPLE WITH REPLACEMENT        
WHEN        departmentno=100 then .40
WHEN        departmentno=200 then .25
WHEN        departmentno=300 then .88
WHEN        departmentno=400 then .35


END        ;


Employeeid
DepartmentNo
SampleId
1
100
1
133
100
1
130
100
1
10
200
2
6
200
2
152
300
3
150
300
3
7
300
3
3
300
3
151
300
3
144
400
4
4
400
4
135
400
4


Ex 2:

Select        employeeid,departmentno,sampleid
From        employee2
SAMPLE WITH REPLACEMENT        
WHEN        departmentno=100 then 4,2
WHEN        departmentno=300 then 4,4
END        ;

Employeeid
DepartmentNo
SampleId
131
100
1
5
100
1
1
100
1
130
100
1
131
100
2
1
100
2
151
300
3
7
300
3
3
300
3
150
300
3
150
300
4
3
300
4
151
300
4
7
300
4

We can see that multiple rows are duplicated across samples.
Same row appears in sample 3 and sample 4.