Using RANK and COUNT window
function to generate percentile:
We can
use RANK to rank data. By default we provide rank 1 to the highest value as
shown below:
select employeeid, salary, rank(salary) from employee;
Employeeid
|
Salary
|
Rank(Salary)
|
144
|
12345.11
|
1
|
124
|
12345.11
|
1
|
12
|
12000
|
3
|
11
|
11000
|
4
|
10
|
10000
|
5
|
9
|
9000
|
6
|
8
|
8000
|
7
|
7
|
7000
|
8
|
6
|
6000
|
9
|
5
|
5000
|
10
|
4
|
4000
|
11
|
3
|
3000
|
12
|
2
|
2000
|
13
|
1
|
1000
|
14
|
We can
also change the way in which ranks are assigned.
In
above example the highest value gets the 1st rank.
We can
make the lowest value to get the rank 1 and highest value the highest rank by
adding ASC in the RANK function as shown below:
select employeeid, salary,rank(salary ASC) from employee;
Employeeid
|
Salary
|
Rank(Salary ASC)
|
1
|
1000
|
1
|
2
|
2000
|
2
|
3
|
3000
|
3
|
4
|
4000
|
4
|
5
|
5000
|
5
|
6
|
6000
|
6
|
7
|
7000
|
7
|
8
|
8000
|
8
|
9
|
9000
|
9
|
10
|
10000
|
10
|
11
|
11000
|
11
|
12
|
12000
|
12
|
124
|
12345.11
|
13
|
144
|
12345.11
|
13
|
Now
following query divides the RANK with COUNT window output to generate
percentile:
select
Employeeid ,
Salary ,
COUNT(*) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) AS CNTGRP
,rank(salary ASC) AS RNK,
(RNK * 1.00 )/
CNTGRP from employee;
Above
query gives the result as follows , which is similar to percentile
Employeeid
|
Salary
|
CNTGRP
|
RNK
|
((RNK*1.00)/CNTGRP)
|
1
|
1000
|
14
|
1
|
0.07
|
2
|
2000
|
14
|
2
|
0.14
|
3
|
3000
|
14
|
3
|
0.21
|
4
|
4000
|
14
|
4
|
0.29
|
5
|
5000
|
14
|
5
|
0.36
|
6
|
6000
|
14
|
6
|
0.43
|
7
|
7000
|
14
|
7
|
0.5
|
8
|
8000
|
14
|
8
|
0.57
|
9
|
9000
|
14
|
9
|
0.64
|
10
|
10000
|
14
|
10
|
0.71
|
11
|
11000
|
14
|
11
|
0.79
|
12
|
12000
|
14
|
12
|
0.86
|
124
|
12345.11
|
14
|
13
|
0.93
|
144
|
12345.11
|
14
|
13
|
0.93
|
Note
the use of multiplication with 1.00 . It is done with the intension to convert
the result to decimal format, other wise all the values below 1 will be shown
as 0 as we are dividing integer by an integer.
COUNT window function and
QUANTILE:
The
count window function is closely related to quantile function .
Quantile
internally makes use of COUNT window function.
The
relation can be shown as below:
QUANTILE(n,COLUMN)=
(RANK(COLUMN ASC)-1) * n / COUNT window(COLUMN)
Ex: Percentile
select
Employeeid ,
Salary ,
COUNT(salary) OVER (ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS CNTGRP
,rank(salary ASC) AS RNK
,QUANTILE (100,salary)
,((RNK -1 ) * 100 )/ CNTGRP from employee;
Employeeid
|
Salary
|
CNTGRP
|
RNK
|
Quantile(100,Salary)
|
(((RNK-1)*100)/CNTGRP)
|
1
|
1000
|
14
|
1
|
0
|
0
|
2
|
2000
|
14
|
2
|
7
|
7
|
3
|
3000
|
14
|
3
|
14
|
14
|
4
|
4000
|
14
|
4
|
21
|
21
|
5
|
5000
|
14
|
5
|
28
|
28
|
6
|
6000
|
14
|
6
|
35
|
35
|
7
|
7000
|
14
|
7
|
42
|
42
|
8
|
8000
|
14
|
8
|
50
|
50
|
9
|
9000
|
14
|
9
|
57
|
57
|
10
|
10000
|
14
|
10
|
64
|
64
|
11
|
11000
|
14
|
11
|
71
|
71
|
12
|
12000
|
14
|
12
|
78
|
78
|
124
|
12345.11
|
14
|
13
|
85
|
85
|
144
|
12345.11
|
14
|
13
|
85
|
85
|
Ex: Quartile
select
Employeeid ,
Salary ,
COUNT(salary) OVER (ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS CNTGRP
,rank(salary ASC) AS RNK
,QUANTILE (30,salary)
,((RNK -1 ) * 30 )/ CNTGRP from employee;
Employeeid
|
Salary
|
CNTGRP
|
RNK
|
Quantile(30,Salary)
|
(((RNK-1)*30)/CNTGRP)
|
1
|
1000
|
14
|
1
|
0
|
0
|
2
|
2000
|
14
|
2
|
2
|
2
|
3
|
3000
|
14
|
3
|
4
|
4
|
4
|
4000
|
14
|
4
|
6
|
6
|
5
|
5000
|
14
|
5
|
8
|
8
|
6
|
6000
|
14
|
6
|
10
|
10
|
7
|
7000
|
14
|
7
|
12
|
12
|
8
|
8000
|
14
|
8
|
15
|
15
|
9
|
9000
|
14
|
9
|
17
|
17
|
10
|
10000
|
14
|
10
|
19
|
19
|
11
|
11000
|
14
|
11
|
21
|
21
|
12
|
12000
|
14
|
12
|
23
|
23
|
124
|
12345.11
|
14
|
13
|
25
|
25
|
144
|
12345.11
|
14
|
13
|
25
|
25
|
No comments:
Post a Comment