Teradata Learning- WITH and WITH BY (02/24/2013)



Before we learn Totals and Subtotals we first create a test table using below queries:

CREATE TABLE MY_TEST_DB.SALES_TABLE, NO FALLBACK
(
Product_id INTEGER,
SALE_DATE DATE,
DAILY_SALES DECIMAL(10,2)
)PRIMARY INDEX(PRODUCT_ID)
INDEX(SALE_DATE);


INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(1000        ,        '09/28/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        48850.40);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(1000        ,        '09/29/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        54500.22);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(1000        ,        '09/30/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        36000.07);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(1000        ,        '10/01/2000' (DATE,FORMAT 'MM/dd/yyyy'),        40200.43);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(1000        ,        '10/02/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        32800.50);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(1000        ,        '10/03/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        64300.00);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(1000        ,        '10/04/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        54553.10);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(2000        ,        '09/28/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        41888.88);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(2000        ,        '09/29/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        48000.00);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(2000        ,        '09/30/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        49850.03);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(2000        ,        '10/01/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        54850.29);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(2000        ,        '10/02/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        36021.93);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(2000        ,        '10/03/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        43200.18);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(2000        ,        '10/04/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        32800.50);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(3000        ,        '09/28/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        61301.77);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(3000        ,        '09/29/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        34509.13);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(3000        ,        '09/30/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        43868.86);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(3000        ,        '10/01/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        28000.00);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(3000        ,        '10/02/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        19678.94);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(3000        ,        '10/03/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        21553.79);
INSERT INTO MY_TEST_DB.SALES_TABLE VALUES(3000        ,        '10/04/2000'        (DATE,FORMAT 'MM/dd/yyyy'),        15675.33);




·        WITH .. BY:

·        WITH.. BY is a Teradata extension ( not allowed in ANSI) . This feature works only from BTEQ and not in SQL assistant(ODBC clients).
 
·        WITH… BY is different from GROUP BY in the fact that with GROUP BY detail rows are eliminated. But with WITH .. BY detail rows are not eliminated.
 
·        WITH..BY allows us to provide sub total(or sub-counts, sub - averages) breaks on more than 1 column.
 
·        Also note that WITH .. BY will automatically generate SORT by the columns mentioned in the BY clause.

Example:


select product_id,sale_date,daily_sales from sales_table with SUM(daily_sales) by product_id;

 *** Query completed. 24 rows found. 3 columns returned.
 *** Total elapsed time was 1 second.


 Product_id   SALE_DATE   DAILY_SALES
-----------  ----------  ------------
       1000  2000-09-30      36000.07
       1000  2000-10-02      32800.50
       1000  2000-10-03      64300.00
       1000  2000-10-04      54553.10
       1000  2000-10-01      40200.43
       1000  2000-09-29      54500.22   --> detail row
       1000  2000-09-28      48850.40   --> detail row
                         ------------
       Sum(DAILY_SALES)     331204.72   --> Subtotal

       2000  2000-09-30      49850.03
       2000  2000-10-02      36021.93
       2000  2000-10-03      43200.18
       2000  2000-10-04      32800.50
       2000  2000-10-01      54850.29
       2000  2000-09-29      48000.00
       2000  2000-09-28      41888.88
                         ------------
       Sum(DAILY_SALES)     306611.81

       3000  2000-09-30      43868.86
       3000  2000-10-02      19678.94
       3000  2000-10-03      21553.79
       3000  2000-10-04      15675.33
       3000  2000-10-01      28000.00
       3000  2000-09-29      34509.13
       3000  2000-09-28      61301.77
                         ------------
       Sum(DAILY_SALES)     224587.82

Note that the result is automatically sorted by the column product_id.(column in BY clause). Also note that we can see detail rows along with subtotals.

Running the above query in SQL ASSISTANT (Teradata V13) gives us the detail rows ordered by product_id.
However subtotals are not visible:


·        WITH …. BY multiple aggregates

Multiple aggregates within the same WITH BY clause are possible if all aggregates are based on the same 'break' column.

Example:

select product_id,sale_date,daily_sales from sales_table
with
SUM(daily_sales) ,
AVG(daily_sales),
MAX(Daily_sales)
by product_id;

 *** Query completed. 24 rows found. 3 columns returned.
 *** Total elapsed time was 2 seconds.


 Product_id   SALE_DATE   DAILY_SALES
-----------  ----------  ------------
       1000  2000-09-30      36000.07
       1000  2000-10-02      32800.50
       1000  2000-10-03      64300.00
       1000  2000-10-04      54553.10
       1000  2000-10-01      40200.43
       1000  2000-09-29      54500.22
       1000  2000-09-28      48850.40
                         ------------
                         ------------
                         ------------
       Sum(DAILY_SALES)     331204.72
   Average(DAILY_SALES)      47314.96
   Maximum(DAILY_SALES)      64300.00

       2000  2000-09-30      49850.03
       2000  2000-10-02      36021.93
       2000  2000-10-03      43200.18
       2000  2000-10-04      32800.50
       2000  2000-10-01      54850.29
       2000  2000-09-29      48000.00
       2000  2000-09-28      41888.88
                         ------------
                         ------------
                         ------------
       Sum(DAILY_SALES)     306611.81
   Average(DAILY_SALES)      43801.69
   Maximum(DAILY_SALES)      54850.29

       3000  2000-09-30      43868.86
       3000  2000-10-02      19678.94
       3000  2000-10-03      21553.79
       3000  2000-10-04      15675.33
       3000  2000-10-01      28000.00
       3000  2000-09-29      34509.13
       3000  2000-09-28      61301.77
                         ------------
                         ------------
                         ------------
       Sum(DAILY_SALES)     224587.82
   Average(DAILY_SALES)      32083.97
   Maximum(DAILY_SALES)      61301.77



·        WITH ..BY with multiple columns in BY clause:

We can specify multiple columns as the break columns. If either of the column changes the subtotal value is printed.

select product_id,extract(MONTH FROM sale_date) AS mon_sale_date,daily_sale
s from sales_table
with
SUM(daily_sales) (TITLE 'SUM ') by product_id, extract(MONTH FROM sale_dat
e) ;

 *** Query completed. 27 rows found. 3 columns returned.
 *** Total elapsed time was 1 second.


 Product_id  mon_sale_date   DAILY_SALES
-----------  -------------  ------------
       1000              9      36000.07
       1000              9      54500.22
       1000              9      48850.40
                            ------------
                      SUM      139350.69

       1000             10      54553.10
       1000             10      64300.00
       1000             10      32800.50
       1000             10      40200.43
                            ------------
                      SUM      191854.03

       2000              9      49850.03
       2000              9      48000.00
       2000              9      41888.88
                            ------------
                      SUM      139738.91

       2000             10      32800.50
       2000             10      43200.18
       2000             10      36021.93
       2000             10      54850.29
                            ------------
                      SUM      166872.90

       3000              9      43868.86
       3000              9      61301.77
       3000              9      34509.13
                            ------------
                      SUM      139679.76

       3000             10      15675.33
       3000             10      21553.79
       3000             10      19678.94
       3000             10      28000.00
                            ------------
                      SUM       84908.06

The above result is sorted by PRODUCT_ID (ASC) and MONTH(ASC). If Either of the column value changes a sub total gets printed.


·        WITH … BY with multiple Subtotal levels

We can use multiple 'WITH BY' clauses in an SQL to create multiple subtotal levels.

select
product_id,
extract(MONTH FROM sale_date) AS mon_sale_date,
daily_sales from sales_table
WITH
SUM(daily_sales) (TITLE 'SUM MONTH')  by extract(MONTH FROM sale_date)
WITH
SUM(daily_sales) (TITLE 'SUM PRODUCT ID')  by product_id;

 *** Query completed. 30 rows found. 3 columns returned.
 *** Total elapsed time was 1 second.


 Product_id  mon_sale_date   DAILY_SALES
-----------  -------------  ------------
       1000              9      36000.07
       1000              9      48850.40
       1000              9      54500.22
                            ------------
                 SUM MONTH     139350.69 --> SUM FOR product_id=1000 and month 9

       1000             10      54553.10
       1000             10      64300.00
       1000             10      32800.50
       1000             10      40200.43
                            ------------
                 SUM MONTH     191854.03 --> SUM FOR product_id=1000 and month 10

                            ------------
            SUM PRODUCT ID     331204.72 --> SUM for all months for product_id=1000

       2000              9      49850.03
       2000              9      41888.88
       2000              9      48000.00
                            ------------
                 SUM MONTH     139738.91

       2000             10      32800.50
       2000             10      43200.18
       2000             10      36021.93
       2000             10      54850.29
                            ------------
                 SUM MONTH     166872.90

                            ------------
            SUM PRODUCT ID     306611.81

       3000              9      43868.86
       3000              9      34509.13
       3000              9      61301.77
                            ------------
                 SUM MONTH     139679.76

       3000             10      15675.33
       3000             10      21553.79
       3000             10      19678.94
       3000             10      28000.00
                            ------------
                 SUM MONTH      84908.06

                            ------------
            SUM PRODUCT ID     224587.82


Note that the result is 1st sorted by the column in outer BY .ie product_id and then by the inner BY .ie extracted month.

Thus when using multiple WITH BY , the column in OUTER BY becomes the major sort sequence and the inner ones become minor sort sequence.



·        WITH clause   (without BY)

BY clause is used for creating breaks and printing sub totals.
If we don’t use BY, we will not get subtotals, but WITH clause will print the Grand Total/GRAND average/grand total

select product_id,extract(MONTH FROM sale_date) AS mon_sale_date,daily_sale
s from sales_table
with
SUM(daily_sales) (TITLE 'SUM ') ;

 *** Query completed. 22 rows found. 3 columns returned.
 *** Total elapsed time was 2 seconds.


 Product_id  mon_sale_date   DAILY_SALES
-----------  -------------  ------------
       2000              9      49850.03
       2000             10      36021.93
       2000             10      43200.18
       2000             10      32800.50
       2000             10      54850.29
       2000              9      48000.00
       2000              9      41888.88
       1000              9      36000.07
       1000             10      32800.50
       1000             10      64300.00
       1000             10      54553.10
       1000             10      40200.43
       1000              9      54500.22
       1000              9      48850.40
       3000              9      43868.86
       3000             10      19678.94
       3000             10      21553.79
       3000             10      15675.33
       3000             10      28000.00
       3000              9      34509.13
       3000              9      61301.77
                            ------------
                      SUM      862404.35 --> Grand total

Note that when we use BY clause the result is sorted by that column.
However above column is not sorted by any means.

However we can use the following query to get the results sorted .

select product_id,extract(MONTH FROM sale_date) AS mon_sale_date,daily_sales from sales_table
with
SUM(daily_sales) (TITLE 'SUM ')
order by 1;



·        DISTINCT modifier:

Use the DISTINCT modifier is used in conjuction with the COUNT aggregate to prevent the same value from being counted more than once.

Example:


select product_id,extract(MONTH FROM sale_date) AS mon_sale_date,daily_sale
s from sales_table
with
count(product_id)  (TITLE 'COUNT OF PRODUCT')
order by 1;

 *** Query completed. 22 rows found. 3 columns returned.
 *** Total elapsed time was 2 seconds.


 Product_id  mon_sale_date   DAILY_SALES
-----------  -------------  ------------
       1000              9      36000.07
       1000             10      32800.50
       1000             10      64300.00
       1000             10      54553.10
       1000             10      40200.43
       1000              9      54500.22
       1000              9      48850.40
       2000              9      49850.03
       2000             10      36021.93
       2000             10      43200.18
       2000             10      32800.50
       2000             10      54850.29
       2000              9      48000.00
       2000              9      41888.88
       3000              9      43868.86
       3000             10      19678.94
       3000             10      21553.79
       3000             10      15675.33
       3000             10      28000.00
       3000              9      34509.13
       3000              9      61301.77
-----------
         21

Above query gives number of products. However problem is that same product is counted multiple times.
Hence this is not the unique counts of products.
To get the number of unique products we use the following query:


select product_id,extract(MONTH FROM sale_date) AS mon_sale_date,daily_sale
s from sales_table
with
count(distinct product_id)  (TITLE 'COUNT OF PRODUCT')
order by 1;

 *** Query completed. 22 rows found. 3 columns returned.
 *** Total elapsed time was 1 second.


 Product_id  mon_sale_date   DAILY_SALES
-----------  -------------  ------------
       1000              9      36000.07
       1000             10      32800.50
       1000             10      64300.00
       1000             10      54553.10
       1000             10      40200.43
       1000              9      54500.22
       1000              9      48850.40
       2000              9      49850.03
       2000             10      36021.93
       2000             10      43200.18
       2000             10      32800.50
       2000             10      54850.29
       2000              9      48000.00
       2000              9      41888.88
       3000              9      43868.86
       3000             10      19678.94
       3000             10      21553.79
       3000             10      15675.33
       3000             10      28000.00
       3000              9      34509.13
       3000              9      61301.77
-----------
          3

Note : When using DISTINCT with an aggregate function only a single column or expression may be used.


·        Combining WITH and WITH BY:

We can combine WITH and WITH BY in same query to obtain subtotals as well as grand totals.

select daily_sales,product_id,extract(MONTH FROM sale_date) AS mon_sale_dat
e from sales_table
With SUM(daily_sales)  (TITLE 'SUM OF PRODUCT') BY PRODUCT_ID
WITH SUM(daily_sales)  --> WITH should be after all the WITH BY
order by 3;

 *** Query completed. 25 rows found. 3 columns returned.
 *** Total elapsed time was 1 second.


 DAILY_SALES   Product_id  mon_sale_date
------------  -----------  -------------
    54500.22         1000              9
    36000.07         1000              9
    48850.40         1000              9
    64300.00         1000             10
    54553.10         1000             10
    32800.50         1000             10
    40200.43         1000             10
------------
   331204.72   --> Sub total

    48000.00         2000              9
    49850.03         2000              9
    41888.88         2000              9
    43200.18         2000             10
    32800.50         2000             10
    36021.93         2000             10
    54850.29         2000             10
------------
   306611.81 --> Sub total

    34509.13         3000              9
    43868.86         3000              9
    61301.77         3000              9
    21553.79         3000             10
    15675.33         3000             10
    19678.94         3000             10
    28000.00         3000             10
------------
   224587.82  --> Sub total

------------
   862404.35  --> Grand Total

Note : Above query has a  ORDER BY clause. The WITH BY clause controls the high level sort .i.e. (PRODUCT_ID) and ORDER BY clause controls the minor sort sequence .i.e. (extracted month)



·        WITH and WITH BY rules in brief:

WITH summary-list BY break-list ASC/DESC:

1.    Above is used to provide SUB-totals, sub-counts or sub-averages along with the display of detail rows.

2.    Summary list can contain multiple columns

WITH  SUM(SALARY),AVG(Salary) BY dept_name

3.    Break list can contain multiple columns

WITH SUM(salary) BY dept_no,Salary_code

If either of dept_no or Salary_code changes the sub-total/count/average is printed.

4.    By default the result is shown in the ORDER specified by the columns in BY clause.(break list columns).
We can control ordering sequence.

5.    If the query contains WITH .. BY and ORDER.. BY then WITH .. BY determines major sort key and ORDER BY specifies any additional minor sorts keys.

6.    A SQL statement can have multiple WITH .. BY clauses.

7.    When we have multiple WITH .. BY clauses  the last (outer) WITH .. BY clause is the highest sort sequence.

WITH Sum(salary) with product_sub_code
WITH SUM(salary) WITH product_id

Here product_id is the major sort key and product_sub_code is the minor sort key.


                                                          WITH summary-list

1.    Used to provide grand totals, counts or averages

2.    Summary list can contain multiple columns.

3.    Note that a query can contain only one WITH . (unlike WITH... BY which can be used multiple times)

4.    When a query contains both WITH and WITH BY clause , WITH Should be written after all the WITH .. BY



No comments:

Post a comment