Saturday 30 March 2013

Temp Table - Part 4 - Global temporary tables



Global Temporary tables

  • The base definition for the global temporary table is created using CREATE GLOBAL TEMPORARY TABLES syntax.
The base definitions are stored in data dictionary.

  • The instances of the global temporary tables are materialized using any of the following SQL's

  1. INSERT
     
  2. INSERT SELECT
     
  3. DROP STATISTICS
     
  4. COLLECT STATISTICS
     
  5. CREATE INDEX .. ON TEMPORARY ….
     
  6. DROP INDEX … ON TEMPORARY …

  • Users temp space is used by the materialized instances.

  • Global temporary tables survive system restarts.

  • Each instance of global temp table is local to the session.

  • Each instance of the global temporary table can be dropped manually. If not the instance will get dropped automatically at the end of the session. Note that only the materialized instance gets dropped and not the base definition.

  • Global table instances start out empty initially.

  • Like Volatile tables , Global table also have LOG/NO LOG and ON COMMMIT PRESERVE/DELETE  ROWS options available  on them.

  • Each of the session can have its own materialized instance. However data in the both instances cannot be shared.

  • Example of creating a Global temporary table definition


CREATE GLOBAL TEMPORARY TABLE G_TEMP, NO FALLBACK , NO LOG
(
Empid integer,
Salary decimal(10,2)
Deptno integer
) ;

The ON COMMIT DELETE ROWS clause is the default, so it does not need to appear in the CREATE TABLE statement. If you want to use the command ON COMMIT PRESERVE ROWS, you must specify that in the CREATE TABLE statement. 

We can use ALTER TABLE syntax to change the defaults as show below

ALTER TABLE G_TEMP, ON COMMIT PRESERVE ROWS;

  • For every materialized instance there is a row inserted in the table DBC.Temptables

select * from DBC.Temptables;


HostNo
SessionNo
TableId
BaseDbId
BaseTableId
AccountDbId
StatisticsCnt
1
800401
15-80-91-75-00-00
00-00-D4-05
D7-02-12-9E-00-00
00-00-3B-BE
5
1
800401
15-80-96-75-00-00
00-00-D4-05
D7-02-33-A1-00-00
00-00-3B-BE
-1
1
800401
15-80-97-75-00-00
00-00-D4-05
D7-02-D0-A1-00-00
00-00-3B-BE
-1
1
800401
15-80-9C-75-00-00
00-00-D4-05
29-02-B8-CE-00-00
00-00-3B-BE
1
1
800401
15-80-9D-75-00-00
00-00-D4-05
2B-02-71-CA-00-00
00-00-3B-BE
1
1
800401
15-80-A5-75-00-00
00-00-D4-05
E5-02-1E-3A-00-00
00-00-3B-BE
-1

Above shows at the moment there are 6 materialized instances of global temporary tables.
All of them belong to same session 800401.

  • Logically all three types of tables- Derived, Volatile and Global temporary are types of temporary tables. However in Teradata terms, temporary table means Global Temporary tables.
The keyword TEMPORARY is used in SQL only for global temporary tables.

  • Dropping Global temporary tables

  1. Dropping only the instance

DROP TEMPORARY TABLE sales;

The keyword TEMPORARY in the command indicates the command applies to the instance.

  1. DROP TABLE sales;

This drops the base table definition and local instance if present .
However this will fail if there are any other instances on the system under any other user id.

  1. DROP TABLE sales ALL;

This syntax drops the base definition and also all the existing instances for all the users.
However this will fail if any instance is in transaction.

ALL and TEMPORARY are mutually exclusive.

  • SHOWing table definition:

  1. Seeing DDL of the base definition

    SHOW TABLE sales;

  1. Seeing the DDL of the temporary table instance

SHOW TEMPORARY TABLE sales;

Again the rule is that keyword TEMPORARY applies to the instance of the table.

  • We can use ALTER table query to change the definition of the base, But it will not work if there are any materialized instances.

  • GRANT and REVOKE commands apply only to the base definitions. User needs to have access to the base definition to materialize the instances. Once the instances are created no rights are checked.

  • HELP INDEX

  1. HELP INDEX sales;

Shows the indexes defined on the base .

  1. HELP TEMPORARY INDEX sales;

Shows the indexes defined on the materialized instance. The materialized instances can have different indexes than the base in case they are created or dropped on the instances itself after its materialized.

  • HELP STATS

  1. HELP STATS sales;

Shows statistics defined on the base definition.

  1. HELP TEMPORARY STATS sales;

Shows stats defined on the materialized instances.

Additional stats could be defined on materialized instances.

  • Creating secondary indexes

  1. CREATE INDEX (deptno) on Sales;

This creates an index on the base definition.
The query will fail if we have any materialized instances.
We cannot have a named index on global tables.

  1. CREATE INDEX (deptno) ON TEMPORARY sales;

This creates index on the materialized instance only.

  • Dropping Secondary indexes:

  1. DROP INDEX (deptno) ON Sales;

Drops index on the base definition.


  1. DROP INDEX (deptno) ON TEMPORARY Sales;

Drops the materialized instance of Sales.


  • Collecting stats:

We can define stats on the base table or collect stats on the materialized instances.
When done on base table, it actually defines stats.
When done on the instances, it collects the stats and stored in data dictionary.

  1. COLLECT STATS ON sales COLUMN (deptno);

Defines the Statistics on the column deptno.
This would fail if there are materialized instances.


  1. COLLECT STATS ON TEMPORARY sales COLUMN(deptno)

Collect stats on the instance.

  • DROP stats

We can drop stats on the base definition or the materialized instances

  1. DROP STATS ON sales COLUMN(deptno);

Drops stats defined on the base table. This would fail if there are any materialized instances.

  1. DROP STATS ON TEMPORARY sales COLUMN(deptno);

Drops stats on the materialized instances.

Temp Tables - Part 3 - Volatile tables


Volatile temporary tables:

  • Uses Spool space.

  • No data dictionary access needed.
     
  • Table definition is kept in cache.
     
  • Table is local to session and not the query.
     
  • Table can be used multiple times with in the session.
     
  • Volatile tables can be dropped any with within the session using DROP TABLE. However if we don’t the table will get dropped automatically at the end of the session.
     
  • The volatile table must be explicitly created using the CREATE VOLATILE TABLE syntax.
     
  • Volatile tables don’t survive system restart.

Example of creating volatile table.

CREATE VOLATILE TABLE V_TEMP, NO FALLBACK
(
Empid integer,
Salary decimal(10,2)
Deptno integer
) ON COMMIT PRESERVE ROWS;

Note the highlighted 'ON COMMIT PRESERVE ROWS'  allows us to use the same volatile table again and again within the session.

  • BY default its 'ON COMMIT DELETE ROWS' , which means the data will be deleted after the query is committed.

  • For volatile tables we can also request a NO LOG option which means the transaction journal will not be used.

Example

CREATE VOLATILE TABLE V_TEMP, NO FALLBACK , NO LOG
(
Empid integer,
Salary decimal(10,2)
Deptno integer
) ON COMMIT PRESERVE ROWS;

LOG is default. Which means transaction journal will be maintained.

  • Irrespective of whether we explicitly specify or not the volatile tables are created under userid logged in.

CREATE VOLATILE TABLE username.table1 -->      (Explicit)
CREATE VOLATILE TABLE table1              -->  (Implicit)
CREATE VOLATILE TABLE databasename.table1 -->  This will give is an error message if the databasename specified is not actually the username.

  • Different sessions can use the same volatile table name. But a volatile table cannot use a name that is used by any of the following objects under the user id.

Permanent tables.
Temporary tables.
Views.
Macros.


  • We can create volatile tables with FALLBACK, however as these tables don’t survive system restart having fallback does not add much value. On the contrary they would take twice the spool space.

  • We cannot use following while creating Volatile tables

  1. Permanent Journaling

  1. Referential integrity. Referential integrity means relation between tables, which is stored in DBC. As volatile table don’t need data dictionary we cannot have referential integrity

  1. Check constraints

  1. Column compression.

CREATE VOLATILE TABLE         TEST1         ( salary integer compress 0 ) on commit preserve rows;

Above query would fail with below message:

CREATE TABLE Failed. 3706:  Syntax error: COMPRESS option not allowed for a volatile table.

  1. Default values for columns

CREATE VOLATILE TABLE         TEST1         ( salary integer default 0 ) on commit preserve rows;

Above query would fail with below error message:

CREATE TABLE Failed. 3706:  Syntax error: DEFAULT option not allowed for a volatile table. 

  1. Column titles

  1. Named indexes.

  • We cannot use HELP DATABASE command to find all the volatile tables under a userid. Reason being volatile tables are not stored in data dictionary

To do that we have to use the command HELP VOLATILE TABLE ;

This will show all the volatile tables under a particular user id.

HELP VOLATILE TABLE;

Table Name
Table Id
TEST1                        
06C4AA600000


  • Following are the commands that we cannot run on VT's

  1. HELP and COLLECT STATS: This is no longer true with TD13. With TD13 we can collect stats on volatile table.

  1. CREATE/DROP INDEX : we cannot have indexes on volatile table.

  1. ALTER TABLE.

ALTER TABLE TEST1 ADD  salary decimal(10,2)

Following is the error we would get

ALTER TABLE Failed. 5341:  Volatile table 'TEST1' not allowed in statement. 

  1. GRANT and REVOKE privileges

  • Volatile tables cannot be renamed.

  • Volatile tables cannot be loaded using multiload.

Temp Table - Part 2 - Derived Tables


Derived table:

Example of using Derived table

Assume you have a table daily_sales and suppose we want to produce ranking of the sum of sales of products .
As we know we cannot use OLAP and aggregate functions together in a single query.

Hence we use a derived table as shown in the below query. The derived table contains the sum of sales per product and then the outer query uses the this result for performing ranking.

Select prodid,RANK(sumsales) from
(Select product_id, sum(sales) from
daily_sales group by product_id) AS D_TABLE(prodid,sumsales);

In the above query the derived table name is D_TABLE.
For derived table the select query should be within the parentheses following the FROM clause.
We can optionally provide names to the columns in the derived table. In the above example we named the derived columns as prodid and sumsales.

Temp Tables- Part 1 - Introduction



There are 3 types of temp tables:
  1. Global temp tables
  1. Volatile tables
  1. Derived tables


What are the disadvantages of using permanent tables for temporary purposes:
  1. We need to have separate steps to create and populate tables
  1. It takes perm space to create these tables.
  1. We need to drop the table explicitly after we are done using the tables.
  1. Data dictionary access is needed for creating and dropping the tables.



Derived Tables
Volatile temp tables
Global temporary tables
    1. These tables are local to the query. The tables exists for the duration of the query
    1. Makes use of spool space.
    1. No data dictionary involvement needed
    1. Tables are created using SQL incorporated within the query.
    1. These tables are local to the session and not just the query. The table is discarded at the end of the session.
    1. Makes use of spool space.
    1. No Data dictionary involvement needed
    1. Table must be explicitly created using the CREATE VOLATILE TABLE syntax.
    1. This table is also local to the session and not just the query. The Table instance is discarded at the end of the session.

    The way Global temp tables are different from the volatile table is that global table has a definition in the Data dictionary . This data definition can be shared by multiple users.

    1. Makes use of temp space.
    1. Data dictionary involvement necessary as the definition is stored in data dictionary.
    1. The table definition in the data dictionary is created using the CREATE GLOBAL TEMPORARY TABLE syntax. The instances of the tables are materialized using many other ways explained later.


Friday 29 March 2013

OLAP - Part 9 - SAMPLE and SAMPLEID


Sampling : SAMPLE and SAMPLEID

  1. The SAMPLE function is used to generate samples of data from a table or view.
     
  1. This can be done in two ways:

  • Sample n where n is integer. It will return n number of rows.(Actual number of rows).
If the number n is greater than the number of rows in the table, the sample will consist of the number of rows in the table.
 
  • Sample n where n is a fraction. That will return that much fraction of the table.(.ie Percentage of the table).  n is a decimal value less than 1.00 and greater than .00

Note that calculations resulting in fractional rows must be greater than .5 to actually return a row.

Example: The table employee has 12 rows. .25 sample means 4.2 rows. As the fraction value is not above .5 only 4 rows will be returned.

A .49 samele means 5.88 rows. As .88 is greater than .5 sample of .49 will return 6 rows.


  1. Rows are not reused within the same sample.

If table has 12 rows . A query like following will return only 12 rows as table has only 12 rows.

select * from employee sample 15;


  1. Check the query as below

Select count(distinct departmentNo) from employee sample 5;

We might think that the above query is creating a sample of 5 records and the  find number of distinct department numbers in the sample. However in actual all the rows in the table are considered and number of distinct departments as 4.

To produce correct result we would have to use a derived table as shown below:

Select count(distinct A. departmentno)  from
(Select * from employee sample 5) A;

This query gives result as 3 which is the count of distinct department numbers within the sample.

If we run the same query again and again the result may vary as the sample output may vary every time.

  1. Multiple sample sets may be generated in a single query if desired.

    In order to identify the specific set a tag called the SAMPLEID is made available for association with each set.

The SAMPLEID may be selected, used for ordering, or used as a column in a new table.

Note that When using multiple sets rows are not repeated to different sample sets.

SAMPLEID is a keyword that assigns a sample-id to a specific sample set.

Example:

Select EMPLOYEEID, SAMPLEID from employee sample .5,.25

Employeeid
SampleId
5
1
10
1
11
1
12
1
1
1
6
1
2
2
4
2
7
2

The sampleid indicates which sample the record belongs to. Here 6 rows belong to sample 1 and 3 rows belong to sample 2.

If the sum of sample increases beyond 1 then it results in an error.

Select EMPLOYEEID, SAMPLEID from employee sample .5,.6;

Error message: SELECT Failed. 5473:  SAMPLE clause has invalid set of arguments. 

As mentioned earlier rows are not repeated with in different samples, even if the there are not enough rows to return

The employee table has only 12 rows. Following is the result when we run the below query

Select EMPLOYEEID, SAMPLEID from employee sample 6,5,4;

Employeeid
SampleId
5
1
3
1
10
1
4
1
2
1
1
1
11
2
9
2
7
2
6
2
8
2
12
3

We get only 12 rows as output as there are no enough rows in the table. Note that the last sample does not enough rows to return . Hence we only have one row with sampleid 3.

OLAP - Part 8 - Quantiles


Using Quantiles:

  1. Quantiles are used to divide a number of rows into a number of partitions of roughly equal size.

  1. The most common quantile is the percentile which is based on a value of 100.We can also have quartiles, deciles etc.

  1. Syntax:

QUANTILE (quantile_constant,sortlist)

quantile_constant = constant used to define number of quantile partitions.

sortlist = column reference to be used for the partitioning and to sort the output. 

  1. Example:

select employeeid, salary, quantile(10,salary) from employee;

Employeeid
Salary
Quantile(10,Salary)
1
1000
0
2
2000
0
3
3000
1
4
4000
2
5
5000
3
6
6000
4
7
7000
5
8
8000
5
9
9000
6
10
10000
7
11
11000
8
12
12000
9

Note that by default the quantile and quantile column will be output in ascending sequence.


The ORDER BY clause can be used to override the ascending sort default.

QUANTILE(10, salary DESC)

The DESC specification is the default for the QUANTILE function. This might appear to be a contradiction to what shown above but in reality it is not.

In looking at the report generated using QUANTILE (10, salary DESC), the output appears to be sorted by salary ascending. The DESC qualifier on the column indicates only that the column ‘salary’ will descend as the quantile value itself descends, regardless of the requested ordering (i.e. ORDER BY) of the output.

select employeeid, salary, quantile(10,salary desc) from employee;

Employeeid
Salary
Quantile(10,Salary)
1
1000
0
2
2000
0
3
3000
1
4
4000
2
5
5000
3
6
6000
4
7
7000
5
8
8000
5
9
9000
6
10
10000
7
11
11000
8
12
12000
9



In other words, the lowest ‘salary’ value  will correspond to the lowest quantile value (0) and conversely the highest ‘salary’  will correspond to the highest quantile value.
In short, adding DESC is unnecessary as it is the default. 


  1. Using ASC is the non-default option. In this case the column ‘salary’ will ascend as the quantile value descends. In other words, the highest will correspond to the lowest quantile value.

With quantiles, an ascending quantile means that the lowest value gets the highest quantile.

select employeeid, salary, quantile(10,salary ASC) from employee;

Employeeid
Salary
Quantile(10,Salary ASC)
12
12000
0
11
11000
0
10
10000
1
9
9000
2
8
8000
3
7
7000
4
6
6000
5
5
5000
5
4
4000
6
3
3000
7
2
2000
8
1
1000
9

  1. Quantile is a OLAP function and hence cannot be combined with normal aggregate functions.

    Hence a query like below is invalid:

SELECT SUM(salary) from employee qualify quantile(10,salary) > 75;

  1. Quantiles can be used to answer questions like “Show all employee salaries that are in the top 25 % of the company.”

We can have multiple columns in the sortlist.
Example:

                  QUANTILE (100, salary_amount, employee_number)
      
The purpose of the ‘employee_number’ here is to handle a situation where there is a tie on the salary amount. Remember, the ‘salary_amount’ parameter (with a default of DESC) tells us that the salary_amount will descend as the quantile value descends.
     
        The default sort for quantiles is the column descends as the quantile descends. Thus, salary   amount descends as quantile descends. If there is a tie in the salary, then employee number descends as the quantile descends.