Teradata Learning - Aggregate Functions (02/24/2013)


  • Important aggregate functions are:

  1. COUNT
  2. SUM
  3. AVG
  4. MAX(MAXIMUM)
  5. MIN(MINIMUM)

  • AGGREGATE functions ignore NULL values.

If we perform COUNT(SALARY) and if 3 of 5 rows in the table have salary as NULL then the result will be only 2.
Similarly if we do AVG(SALARY) then average would reflect average of only 2 values.

Note: To count all rows irrespective of column being null then use COUNT(*)


  • Aggregates with GROUP BY:

Aggregate functions give only one row as the result. They provide aggregation over the entire set of input rows.

Example: if Employee table contains salary details, SUM(Salary) would give sum of salary for all records.
However we may need Sum of salaries for each department.
We can query the table for each department using where clause. However this will be inefficient in case number of departments is more.

In such cases we can use GROUP BY clause. We would do a SUM(Salary) along with 'GROUP BY department_name'. We need to think of GROUP BY as 'For Each'. It will give us the sum of salaries for each department.

Example:

SELECT department_name         ,SUM (salary_amount)
        FROM   employee_table
        GROUP BY
department_name
        ;

Department_name
SUM(salary_amount)
Sales
400000
Marketing
200000
IT
100000

We can use WHERE clause to restrict the SUM of salary to specific departments. WHERE clause eliminates rows before GROUP BY puts them into desired groupings.

SELECT department_name         ,SUM (salary_amount)
        FROM   employee_table
        GROUP BY
department_name
        WHERE department_name like '%a%';

Department_name
SUM(salary_amount)
Sales
400000
Marketing
200000


  • GROUP BY on Multiple Columns:

It is possible to GROUP BY more than one column. This allows grouping by multiple columns.

SELECT department_name         ,job_code, SUM (salary_amount)
        FROM   employee_table
        GROUP BY
department_name,job_code
        ;

  • Rules of aggregation:

  1. Whenever SELECT part of any query contains aggregates and non-aggregates then we need to have a GROUP BY

select databasename, sum(currentperm) as currentperm_in_kb from DBC.tablesize;

Above query will give following error code:

SELECT Failed. 3504:  Selected non-aggregate values must be part of the associated group. 

  1. All the non-aggregated selected columns must be included in the GROUP BY clause.

This means the above query should look like as follows:

select databasename, sum(currentperm) as currentperm_in_kb from DBC.tablesize GROUP BY databasename;



  • HAVING clause:

HAVING Clause is similar to WHERE clause in that it helps us eliminate rows (or rather qualify rows).
However difference between the two is that HAVING applies to groups rather than rows.

WHERE clause is applied before performing aggregation and HAVING clause is applied after the aggregation.

SELECT department_name,Sum(salary)
FROM employee_table
GROUP BY department_name
HAVING SUM(salary) > 10000;


No comments:

Post a Comment