Showing posts with label Teradata COUNT OVER function. Show all posts
Showing posts with label Teradata COUNT OVER function. Show all posts

Saturday, 20 April 2013

Relationship between COUNT OVER, QUANTILE and PERCENTILE


Using RANK and COUNT window function to generate percentile:

We can use RANK to rank data. By default we provide rank 1 to the highest value as shown below:

select employeeid, salary, rank(salary) from employee;

Employeeid
Salary
Rank(Salary)
144
12345.11
1
124
12345.11
1
12
12000
3
11
11000
4
10
10000
5
9
9000
6
8
8000
7
7
7000
8
6
6000
9
5
5000
10
4
4000
11
3
3000
12
2
2000
13
1
1000
14

We can also change the way in which ranks are assigned.
In above example the highest value gets the 1st rank.

We can make the lowest value to get the rank 1 and highest value the highest rank by adding ASC in the RANK function as shown below:

select employeeid, salary,rank(salary ASC) from employee;

Employeeid
Salary
Rank(Salary ASC)
1
1000
1
2
2000
2
3
3000
3
4
4000
4
5
5000
5
6
6000
6
7
7000
7
8
8000
8
9
9000
9
10
10000
10
11
11000
11
12
12000
12
124
12345.11
13
144
12345.11
13


Now following query divides the RANK with COUNT window output to generate percentile:

select
Employeeid ,
Salary ,
COUNT(*) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS CNTGRP
,rank(salary ASC) AS RNK,
(RNK * 1.00 )/ CNTGRP from employee;

Above query gives the result as follows , which is similar to percentile

Employeeid
Salary
CNTGRP
RNK
((RNK*1.00)/CNTGRP)
1
1000
14
1
0.07
2
2000
14
2
0.14
3
3000
14
3
0.21
4
4000
14
4
0.29
5
5000
14
5
0.36
6
6000
14
6
0.43
7
7000
14
7
0.5
8
8000
14
8
0.57
9
9000
14
9
0.64
10
10000
14
10
0.71
11
11000
14
11
0.79
12
12000
14
12
0.86
124
12345.11
14
13
0.93
144
12345.11
14
13
0.93

Note the use of multiplication with 1.00 . It is done with the intension to convert the result to decimal format, other wise all the values below 1 will be shown as 0 as we are dividing integer by an integer.



COUNT window function and QUANTILE:

The count window function is closely related to quantile function .

Quantile internally makes use of COUNT window function.

The relation can be shown as below:

QUANTILE(n,COLUMN)=  (RANK(COLUMN ASC)-1) * n / COUNT window(COLUMN)

Ex: Percentile
select
Employeeid ,
Salary ,
COUNT(salary) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS CNTGRP
,rank(salary ASC) AS RNK
,QUANTILE (100,salary)
,((RNK -1 ) * 100 )/ CNTGRP from employee;

Employeeid
Salary
CNTGRP
RNK
Quantile(100,Salary)
(((RNK-1)*100)/CNTGRP)
1
1000
14
1
0
0
2
2000
14
2
7
7
3
3000
14
3
14
14
4
4000
14
4
21
21
5
5000
14
5
28
28
6
6000
14
6
35
35
7
7000
14
7
42
42
8
8000
14
8
50
50
9
9000
14
9
57
57
10
10000
14
10
64
64
11
11000
14
11
71
71
12
12000
14
12
78
78
124
12345.11
14
13
85
85
144
12345.11
14
13
85
85

Ex: Quartile
select
Employeeid ,
Salary ,
COUNT(salary) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS CNTGRP
,rank(salary ASC) AS RNK
,QUANTILE (30,salary)
,((RNK -1 ) * 30 )/ CNTGRP from employee;

Employeeid
Salary
CNTGRP
RNK
Quantile(30,Salary)
(((RNK-1)*30)/CNTGRP)
1
1000
14
1
0
0
2
2000
14
2
2
2
3
3000
14
3
4
4
4
4000
14
4
6
6
5
5000
14
5
8
8
6
6000
14
6
10
10
7
7000
14
7
12
12
8
8000
14
8
15
15
9
9000
14
9
17
17
10
10000
14
10
19
19
11
11000
14
11
21
21
12
12000
14
12
23
23
124
12345.11
14
13
25
25
144
12345.11
14
13
25
25

Teradata COUNT window Function - Generating totals and subtotals


We have learnt using WITH and WITH BY functions earlier.
  • WITH BY is used to create subtotals and >
  • WITH is used to create grand totals.

BUT problem with using these functions is that WITH and WITHBY are both Teradata extensions . They are not ANSI standards.
Also we know that we cannot use WITH and WITH BY to get relational result.

Hence we make use of COUNT window functions to create subtotals or grand totals.
COUNT window is the ANSI standard.




Generating SUBTOTALs-- Using WITH ...BY and using ANSI Standard COUNT function

Following is the Teradata way of generating the subtotals using the WITH..BY

select departmentNo, Employeeid, Salary
FROM EMPLOYEE
WITH COUNT(departmentNo) BY departmentno;

 *** Query completed. 18 rows found. 3 columns returned.
 *** Total elapsed time was 1 second.


DepartmentNo   Employeeid      Salary
------------  -----------  ----------
         100            1     1000.00
         100            9     9000.00
         100            5     5000.00
------------
           3

         200          124    12345.11
         200            2     2000.00
         200            6     6000.00
         200           10    10000.00
------------
           4

         300            3     3000.00
         300            7     7000.00
         300           11    11000.00
------------
           3

         400            8     8000.00
         400            4     4000.00
         400           12    12000.00
         400          144    12345.11
------------
           4


The result is non relational and hence this query cannot be run using SQL assistant.
We need to run this query using BTEQ.

We can get the similar result using ANSI standard COUNT OVER function.

select
departmentNo,
Employeeid,
Salary,
COUNT(*) OVER (PARTITION BY departmentNO
ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING)
FROM EMPLOYEE
;

DepartmentNo
Employeeid
Salary
Group Count(*)
100
5
5000
3
100
1
1000
3
100
9
9000
3
200
124
12345.11
4
200
2
2000
4
200
6
6000
4
200
10
10000
4
300
3
3000
3
300
11
11000
3
300
7
7000
3
400
12
12000
4
400
8
8000
4
400
4
4000
4
400
144
12345.11
4


Note that this Output is relational and can be run in SQL assistant.

What does the PARTITION BY do?

The 'PARTITION BY ' clause defines the group in which the rows are to be counted.
It does the same function that 'BY' clause does in the teradata way of getting subtotals-  .i.e Establishing Groups.

The count gets showed in each row within a group and hence appear multiple times.

COUNT Window function and standard aggregate functions (.ie SUM,AVG,COUNT,MIN,MAX) may not be used in same select.



Generating Grand TOTALs-- Using WITH ...BY and using ANSI Standard COUNT function



Following is the Teradata Way of generating grand totals using WITH

select
departmentNo,
Employeeid,
Salary
FROM EMPLOYEE
WITH COUNT(departmentNo);

 *** Query completed. 15 rows found. 3 columns returned.
 *** Total elapsed time was one minute and 4 seconds.


DepartmentNo   Employeeid      Salary
------------  -----------  ----------
         200          124    12345.11
         400            8     8000.00
         200            2     2000.00
         200            6     6000.00
         400            4     4000.00
         400           12    12000.00
         100            1     1000.00
         300            3     3000.00
         200           10    10000.00
         300            7     7000.00
         300           11    11000.00
         100            9     9000.00
         100            5     5000.00
         400          144    12345.11
------------
          14

The result is non relational and hence this query cannot be run using SQL assistant.
We need to run this query using BTEQ.

We can get the similar result using ANSI standard COUNT OVER function.

select
departmentNo,
Employeeid,
Salary,
COUNT(*) OVER(
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM EMPLOYEE

;

DepartmentNo
Employeeid
Salary
Group Count(*)
100
1
1000
14
100
5
5000
14
100
9
9000
14
200
2
2000
14
200
6
6000
14
200
10
10000
14
200
124
12345.11
14
300
3
3000
14
300
7
7000
14
300
11
11000
14
400
4
4000
14
400
8
8000
14
400
12
12000
14
400
144
12345.11
14

As we can see the total count is output on each row.