Teradata FAQ's: 2012

OLAP stands for On-Line Analytical Processing.
Following are important OLAP functions:

· RANK - (Rankings)

· QUANTILE - (Quantiles)

· CSUM - (Cumulation)

· MAVG - (Moving Averages)

· MSUM - (Moving Sums)

· MDIFF - (Moving Differences)

· MLINREG - (Moving Linear Regression)

OLAP functions are similar to aggregate functions in the sense that they operate on groups of rows.However they are different from aggregate functions in the sense that they show detail rows as well. This is not possible with aggregate functions.

===========================================

Cumulative Sum:

1. Cumulative sum (CSUM) computes a running or cumulative total of a column’s value.

2. Syntax: CSUM(columnname,sortlist)

Here the sortlist is used to sort the data by some column before cumulative sum is performed. Default sequence is ascending.

Example:

SELECT EMPLOYEEID , CSUM(SALARY,EMPLOYEEID) FROM EDW_RESTORE_TABLES.TESTY;

    employeeid   CSum(Salary,employeeid)
    1                  10000
    2      20000
    2                  40000
    4                  60000
    5                  90000
    6                  100000
    7                  110000
    8                  120000

It is very important to note that CSUM needs a sorting sequence which is first used to sort the data in table and then use it to perform cumulative sum on the column mentioned.

SELECT CSUM(salary) from table1 gives an error.

NOte:
SELECT CSUM(salary,salary) from table1 is also a valid query.
Data is first sorted by Salary column and then used to perform CSUM.

3. Example:

SELECT salesdate, sales, CSUM(sales, salesdate)FROM daily_sales;

salesdate sales Csum

98/01/01 150.00 150.00

98/01/02 200.00 350.00

98/01/03 250.00 600.00

98/01/05 350.00 950.00

98/01/10 550.00 1500.00

98/01/21 150.00 1650.00

98/01/25 200.00 1850.00

98/01/31 100.00 1950.00

98/02/01 150.00 2100.00

98/02/03 250.00 2350.00

98/02/06 350.00 2700.00

98/02/17 550.00 3250.00

98/02/20 450.00 3700.00

98/02/27 350.00 4050.00

In above example Data is sorted by Salesdate and column sales is used for performing cumulative sum.

===========================================

Cumulative Sums With Reset:

1. A cumulative sum may be reset to zero at specified breakpoints.

2. This is done with the help of ’GROUP BY’ statement. Group BY indicates that when value of column specified in ‘group by’ changes the cumulative sum should be reset.

3. Also its very important to note that OLAP and standard aggregations (SUM, COUNT,AVG, MIN,MAX) are not compatible within the same query. Since OLAP and aggregate functions cant be used together , GROUP BY serves a separate purpose for each of the types of query.

4. Example:

SELECT EMPLOYEEID , DEPARTMENTNO,CSUM(SALARY,EMPLOYEEID) FROM EDW_RESTORE_TABLES.TESTY GROUP BY DEPARTMENTNO;

    employeeid   departmentno   CSum(Salary,employeeid)
    1      100                     10000
    2                    100                     20000
    2                    200                     20000 --> accumulation resets
    4      200                     40000
    5                    300                     30000--> accumulation resets
    6      300                     40000
    7      400                     10000--> accumulation resets
    8                    500                     10000--> accumulation resets

NOte that we are using GROUP BY. Here the function of group by is different from what it does with aggregate functions. Here group by is used to provide partitioning logic.

===========================================

Moving Average (MAVG):

1. Used to calculate Moving average on a column.

2. The number of rows used for aggregation operation is called as query width.

3. Syntax:

MAVG(colname, n, sortlist)

colname = the column on which the moving average is computed
n = the number of rows (< 4096) which will be used in the calculation including the current row.

('n' is also refered to as the 'width' of the average)
sortlist = the column(s) which determine the sequencing of the rows. Default is ascending.

4. This function computes the moving AVG of a column based on some number of preceding rows.

5. If the number of rows preceding the current row is less than the width, the average is computed based on the existing preceding rows.

6. Example:

SELECT EMPLOYEEID,SALARY,MAVG(SALARY,2,EMPLOYEEID) FROM EDW_RESTORE_TABLES.TESTY

employeeid Salary MAvg(Salary,2,employeeid)

1 10000 10000.00

2 10000 10000.00 avg of current row and prev row

2 20000 15000.00

4 20000 20000.00

5 30000 25000.00

6 10000 20000.00

7 10000 10000.00

8 10000 10000.00

For computation purpose current row and the preceding n-1 rows are used.

If the number of rows is less than n-1, it uses all preceding rows.

It sorts in ascending order by sortlist column(s) as the default.

===========================================

Moving Sum (MSUM):

1. Used to calculate Moving sum on a column.

2. The number of rows used for aggregation operation is called as query width.

3. Syntax:

MSUM(colname, n, sortlist)

colname = the column on which the moving sum is computed
n = the number of rows (< 4096) which will be used in the calculation including the current row.

('n' is also refered to as the 'width' of the average)
sortlist = the column(s) which determine the sequencing of the rows. Default is ascending.

4. Example:

SELECT salesdate, itemid, sales, MSUM(sales, 3, salesdate)
WHERE itemid = 10

FROM daily_sales;

Result

salesdate itemid sales MSum

98/01/01 10 150.00 150.00

98/01/02 10 200.00 350.00 Sum of 2 rows

98/01/03 10 250.00 600.00 Sum of 3 rows starting row 1

98/01/05 10 350.00 800.00 Sum of 3 rows starting row 2

98/01/10 10 550.00 1150.00 Sum of 3 rows starting row 3

98/01/21 10 150.00 1050.00

98/01/25 10 200.00 900.00 Sum of 3 rows

98/01/31 10 100.00 450.00

98/02/01 10 150.00 450.00

98/02/03 10 250.00 500.00

98/02/06 10 350.00 750.00

98/02/17 10 550.00 1150.00

98/02/20 10 450.00 1350.00

98/02/27 10 350.00 1350.00

5. Moving Sum (MSum) follows the same rules as Moving Averages (MAvg):

· Uses current row and preceding n-1 rows.

· Uses all preceding rows if less than n-1.

· Sort ascending by sortlist column(s) is the default.

=======================================

Moving Differences (MDIFF):

1. The Moving Difference (MDIFF) function permits a calculation of a moving difference of a specified column, based on a defined query width (n).

2. The width determines how many rows back to count for the subtrahend (i.e. the number being subtracted).

3. If there are less than n preceding rows, a null will be generated to represent the difference.

4. Syntax:

MDIFF(colname, n, sortlist)
colname = the column on which the moving sum is computed
n = the number of rows (< 4096)

sortlist = the column(s) which determine the sequencing of the rows. Default is ascending.

5. Example:

SELECT salesdate, itemid, sales, MDIFF(sales, 3, salesdate)

FROM daily_sales;

Result

salesdate itemid sales MDiff

98/01/01 10 150.00 ? Null because there is no value 3 rows above

98/01/02 10 200.00 ? Null because there is no value 3 rows above

98/01/03 10 250.00 ? Null because there is no value 3 rows above

98/01/05 10 350.00 200.00 Difference this row and row 1

98/01/10 10 550.00 350.00 Difference this row and row 2

98/01/21 10 150.00 -100.00

98/01/25 10 200.00 -150.00

98/01/31 10 100.00 -450.00

98/02/01 10 150.00 .OO

98/02/03 10 250.00 50.00 Difference of 2 rows

98/02/06 10 350.00 250.00

98/02/17 10 550.00 400.00

98/02/20 10 450.00 200.00

98/02/27 10 350.00 .00

6. The usage of MDIFF is slightly different than MAvg and MSum in that it:

· Uses current row and preceding nth row

· Value is null if there is no preceding nth row

· Sort ascending by sortlist column(s) is default

========================================

Rank function:

1. The rank function allows a column to be ranked either based on high or low order, against other rows in the answer set.

2. By default, the output will be sorted in descending sequence of the ranking column. This in short means that highest value in the ranked column gives rank 1.

3. Syntax:

RANK(columnname).

where colname represents the column to be ranked and the descending sort key of the result.

4. Example:

SELECT storeid, prodid, sales, RANK(sales)

FROM salestbl

WHERE storeid = 1001;

storeid prodid ____sales Rank

1001 F 150000.00 1

1001 A 100000.00 2

1001 C 60000.00 3

1001 D 35000.00 4

5. Points to note are:

· When Ranking is applied - default highest amount is low rank #.

· The default sort sequence is descending by ranking column (sales).

=======================================================

RANK with ‘Qualify’ and ‘GROUP BY’:

1. The QUALIFY clause allows restriction of which rankings will be output in the final result.

2. QUALIFY performs like the HAVING clause by requesting a specific range in the output.

3. Example:

SELECT storeid, prodid, sales, RANK(sales)

FROM salestbl

GROUP BY storeid

QUALIFY rank(sales) <= 3;

Result

storeid prodid sales Rank

1001 F 150000.00 1

1001 A 100000.00 2

1001 C 60000.00 3

1002 A 40000.00 1

1002 C 35000.00 2

1002 D 25000.00 3

1003 B 65000.00 1

1003 D 50000.00 2

1003 A 30000.00 3

4. Note that here one more feature is being used . ie GROUP BY.

The GROUP BY isn't doing an aggregation. It is actually changing the scope of the query. It also causes an ascending sort.

5. The default sort sequence is descending by ranking column (sales).

Due to GROUP BY, the sort is by sales ascending within store. With in a given store the sort is descending by the Ranked column .ie sales.

6. Qualify rank(sales) <= 3 means it only outputs rows that have rank less than or equal to 3. This means top selling products for each store.

========================================

Teradata FAQ's

links

Blog Archive

Wednesday, 1 February 2012

Explain OLAP and Important OLAP functions?