Using Quantiles:
- Quantiles are used to divide a number of rows into a number of partitions of roughly equal size.
- The most common quantile is the percentile which is based on a value of 100.We can also have quartiles, deciles etc.
- Syntax:
QUANTILE
(quantile_constant,sortlist)
quantile_constant = constant used to define number of quantile
partitions.
sortlist = column reference to be used for the partitioning and
to sort the output.
- Example:
select employeeid, salary, quantile(10,salary) from employee;
Employeeid
|
Salary
|
Quantile(10,Salary)
|
1
|
1000
|
0
|
2
|
2000
|
0
|
3
|
3000
|
1
|
4
|
4000
|
2
|
5
|
5000
|
3
|
6
|
6000
|
4
|
7
|
7000
|
5
|
8
|
8000
|
5
|
9
|
9000
|
6
|
10
|
10000
|
7
|
11
|
11000
|
8
|
12
|
12000
|
9
|
Note that by default the
quantile and quantile column will be output in ascending sequence.
The ORDER BY clause can be used to override the ascending sort
default.
QUANTILE(10, salary DESC)
The DESC specification is the default for
the QUANTILE function. This might appear to be a contradiction
to what shown above but in reality it is not.
In looking at the report generated using QUANTILE (10, salary
DESC), the output appears to be sorted by salary ascending. The DESC qualifier
on the column indicates only that the column ‘salary’ will descend as the
quantile value itself descends, regardless of the requested
ordering (i.e. ORDER BY) of the output.
select employeeid, salary, quantile(10,salary desc)
from employee;
Employeeid
|
Salary
|
Quantile(10,Salary)
|
1
|
1000
|
0
|
2
|
2000
|
0
|
3
|
3000
|
1
|
4
|
4000
|
2
|
5
|
5000
|
3
|
6
|
6000
|
4
|
7
|
7000
|
5
|
8
|
8000
|
5
|
9
|
9000
|
6
|
10
|
10000
|
7
|
11
|
11000
|
8
|
12
|
12000
|
9
|
In other words, the lowest
‘salary’ value will correspond to the
lowest quantile value (0) and conversely the highest ‘salary’ will correspond to the highest quantile
value.
In short, adding DESC is
unnecessary as it is the default.
- Using ASC is the non-default option. In this case the column ‘salary’ will ascend as the quantile value descends. In other words, the highest will correspond to the lowest quantile value.
With quantiles, an ascending quantile means that the lowest
value gets the highest quantile.
select employeeid, salary, quantile(10,salary ASC) from
employee;
Employeeid
|
Salary
|
Quantile(10,Salary ASC)
|
12
|
12000
|
0
|
11
|
11000
|
0
|
10
|
10000
|
1
|
9
|
9000
|
2
|
8
|
8000
|
3
|
7
|
7000
|
4
|
6
|
6000
|
5
|
5
|
5000
|
5
|
4
|
4000
|
6
|
3
|
3000
|
7
|
2
|
2000
|
8
|
1
|
1000
|
9
|
- Quantile
is a OLAP function and hence cannot be combined with normal aggregate
functions.
Hence a query like below is invalid:
SELECT SUM(salary) from employee qualify quantile(10,salary)
> 75;
- Quantiles can be used to answer questions like “Show all employee salaries that are in the top 25 % of the company.”
We can have multiple columns in the sortlist.
Example:
QUANTILE (100, salary_amount, employee_number)
The purpose of the ‘employee_number’ here is to handle a
situation where there is a tie on the salary amount. Remember, the
‘salary_amount’ parameter (with a default of DESC) tells us that the
salary_amount will descend as the quantile value descends.
The default sort for quantiles is the column descends as the quantile descends. Thus, salary amount descends as quantile descends. If there is a tie in the salary, then employee number descends as the quantile descends.
need to get top 20 % sellers basis their transactions? How can it be done?
ReplyDelete