Tuesday 31 January 2012

What is a System Calendar?

  • We can extend the properties of DATE by joining with the System Calendar.
  • System Calendar could be joined with easily with other tables just as a dimension of a STAR schema.
  • System Calendar does not need any manual maintenance. To render the benefits it has to offer all we need is to join with it.
  • The System Calendar has one row for every day from January 1, 1900 through December 31, 2100.This means 200 years.  
  • The System Calendar, as implemented for Teradata, is a high performance set of nested views.
  • The System Calendar is actually a 4-level nested view. Each level of view built on top of the table adds intelligence to the date. 
  • At the lowest level underlying table is Sys_calendar.Caldates.  Caldates has one column called "cdate" with a data type of DATE and is also the Unique primary index.  This has one row for each date in calendar.
  •  Following are the columns that are accessible through system calendar.
calendar_date DATE UNIQUE (Standard Teradata date)
day_of_week BYTEINT, (1-7, where 1 = Sunday)
day_of_month BYTEINT, (1-31)
day_of_year SMALLINT, (1-366)
day_of_calendar INTEGER, (Julian days since 01/01/1900)
weekday_of_month BYTEINT, (nth occurrence of day in month)
week_of_month BYTEINT, (0-5)
week_of_year BYTEINT, (0-53) (partial week at start of year is 0)
week_of_calendar INTEGER, (0-n) (partial week at start is 0)
month_of_quarter BYTEINT, (1-3)
month_of_year BYTEINT, (1-12)
month_of_calendar INTEGER, (1-n) (Month number since Jan, 1900)
quarter_of_year BYTEINT, (1-4)
quarter_of_calendar INTEGER, (quarter number since Jan, 1900)
year_of_calendar SMALLINT, (year since 1900)

  • Calendar is the view under Sys_calendar database that is generally used for complex date calculation. Calendar_date is the key column.
Example:
SELECT * FROM Sys_calendar.Calendar WHERE calendar_date = current_date;
calendar_date: 98/09/21
day_of_week: 2
day_of_month: 21
day_of_year: 264
day_of_calendar: 36058
weekday_of_month: 3
week_of_month: 3
week_of_year: 38
week_of_calendar: 5151
month_of_quarter: 3
month_of_year: 9
month_of_calendar: 1185
quarter_of_year: 3
quarter_of_calendar: 395
year_of_calendar: 1998
  • ·         Example of using Sys_Calendar.Calendar for joining with other tables:
Find sum of sales in daily_sales table for the item id 10 in 1st Quarter of year 2011.
A:
Here we can do a normal sum on the sales column of the daily_sales table.
 However for additional logic of 1st quarter of 2011 we need to use System calendar.
All we need to do is join the date column in daily_sales table with the Calendar_date 
column in the Calendar view and then add the additional filter clauses.
Following is how the query would look like.
 
Select item_id,SUM(sales) FROM daily_sales A,Sys_calendar.Calendar SC where
A.sale_date=SC.Calendar_date and
AND  sc.quarter_of_year = 1
AND sc.year_of_calendar = 2011
AND A.itemid = 10
GROUP BY 1;
 
  • ·         What is a Today view:
Many a times we need to perform queries that are relative to today’s date.
Hence we create a view using Sys_calendar.Calendar view to return information
 relative to the current day, week, month, quarter or year.
Following is how the view would look like:
 
CREATE VIEW today AS (
            SELECT * FROM sys_calendar.calendar
            WHERE calendar.calendar_date=current_date);
 
Example:
 
Compare sales for item 10 during the current and previous month for both this 
year and last year. 
Answer: 
SELECT sc.year_of_calendar AS "year"
   ,sc.month_of_year AS "month"
   ,ds.itemid
   ,sum(ds.sales)
FROM sys_calendar.calendar sc
     ,daily_sales ds
     ,today  td
WHERE sc.calendar_date = ds.salesdate
AND  ((sc.month_of_calendar BETWEEN td.month_of_calendar - 1
            AND td.month_of_calendar)
OR     (sc.month_of_calendar BETWEEN td.month_of_calendar - 13
            AND td.month_of_calendar - 12))
AND ds.itemid = 10
GROUP BY 1,2,3
ORDER BY 1,2;
 
  • ·         There is a Difference in which the column ‘week_of_month’ works. 
    The System Calendar identifies each week of the month with a number
    from 0 to 5, with 0, if present, indicating the first partial week of the month.
     Week number 1 will always be the first full week of the month.
  • · Day of week is assigned values 1 to 7 representing each day of the week 
    where Sunday is day one.

Friday 27 January 2012

What is difference between TIME and CURRENT_TIME?

  • First difference is that TIME is a teradata extension and CURRENT_TIME is a ANSi standard.

  • Second difference is that data type returned by the TIME function is a FLOAT.
However CURRENT_TIME or CURRENT_TIME(0) returns type TIME(0) WITH TIME ZONE.

  • The TIME function lacks intelligence.
Which means SELECT TIME + 40; would give an error if current time is 23:24:34 because 34+40 seconds would lead to 74 seconds which is not valid.


NOTE: This is not case with DATE and CURRENT_DATE.
Both of them return DATE datatype which has calendar intelligence.

How to display TIME is 24hour and 12 hour format?

We use the format clause for this purpose.

SELECT CURRENT_TIME (FORMAT 'HH:MM:SS') ; ---> this gives data in 24 hour format
SELECT CURRENT_TIME (FORMAT 'HH:MM:SSBT); --> this gives time in 12 hour format with AM/PM.

The 'B' represent a blank and 'T' represent 12 HRS FORMAT with AM/PM


How to know who all accessed a database object?

We could use the DBC view DBC.ACCESSLOG.
This view would contain the information about who all accessed the database object.

Other way to find this out is the query logging (DBQL).
All the view under DBC database that start with DBQL* are tables that contain logging information.
However data that gets stored in this tables is controlled by the rules that are set up by the admin.
These rules can be viewed inside the table DBC.DBQLRules.


How to view CLOB and BLOB datatypes in SQL assistant?

CLOB and BLOB both fall in the category called a LOB .ie large object.
BLOB stands for binary large object and CLOB stand for character large object.

  • BLOB is used for Pictures, Music, Word documents.
  • CLOB is used for text data such as Text, HTML, XML or Rich Text (RTF).

When we query a table containing BLOB or CLOB using SQL assistant it will ask you to provide a filename that will be used to store the data on your local machine.Each LOB data value will be written to a separate file.
Name we provide will be used as base and then followed by a number for each row.

Support for large objects was added to SQL Assistant in version 6.2.
Some (older) versions of SQL Assistant require that you set the ODBC DSN option "Use Native Large Object support"  for this to work.

How to calculate current space and maximum perm space of a user or database?

We can use the view DBC.ALLSPACE to calculate current perm space and maximum perm space.
However the catch here is that space in DBC.ALLSPACE is divided in to AMPs(.ie VPROCs).
Hence we need to do a SUM function on the column CURRENTPERM and MAXPERM.

If we need to calculate current and max perm space for database 'DATABASE1' then following would be the query:

Select SUM(CURRENTPERM),SUM(MAXPERM) from DBC.ALLSPACE where Databasename='DATABASE1';

We may also need to calculate space taken by a table.
For that we can modify the above query as below:

Select SUM(CURRENTPERM),SUM(MAXPERM) from DBC.ALLSPACE where Databasename='DATABASE1' and tablename='TESTTABLE';

This can also be achieved using the view DBC.TableSize view. 

Why does size of table increase in multiples of 512 bytes and not 20 Bytes.?

Guys below is one of the questions i had faced.
I posted this question on several blogs and following is the answer i received.


Question:
Hi all

While testing some issue i came across this phenomenon.

I have created a table with one column of CHAR(20).
When i insert a row in this table , the size of the table increases by 512
bytes and not 20 bytes.
I believe it should increase by 20 bytes as i am inserting data only to
CHAR(20) field.

I am not able to understand , why the size has been increasing in multiple
of 512 bytes and not 20 bytes.

Following is what i had ran:

CREATE TABLE RESTORE_TABLES.TESTTABLE, NO FALLBACK
(
FIELD1 CHAR(20)
);

/* CREATED A TABLE WITH ONLY 1 COLUMN fo CHAR(20)*/


SELECT SUM(CurrentPerm) FROM DBC.ALLSPACE WHERE
DatabaseName='RESTORE_TABLES' AND     TableName='TESTTABLE'     
                        
Result is 196608.00 BYTES.

INSERT INTO RESTORE_TABLES.TESTTABLE VALUES('ABC')
SELECT SUM(CurrentPerm) FROM DBC.ALLSPACE WHERE
DatabaseName='RESTORE_TABLES' AND         TableName='TESTTABLE'

Result is 197120.00   . Thus the increase is by 512 bytes.

INSERT INTO RESTORE_TABLES.TESTTABLE VALUES('CDF')
SELECT SUM(CurrentPerm) FROM DBC.ALLSPACE WHERE
DatabaseName='RESTORE_TABLES' AND         TableName='TESTTABLE'

Result is 197632.00 . Thus the increase is again by 512 bytes.

INSERT INTO RESTORE_TABLES.TESTTABLE VALUES(NULL)
SELECT SUM(CurrentPerm) FROM DBC.ALLSPACE WHERE
DatabaseName='RESTORE_TABLES' AND         TableName='TESTTABLE'

Result is 198144.00. For null value also increase is by 512 bytes.


Can anybody please help me understand why the size is in multiples of 512
bytes.



Answer:
This answer is consolidated from several responses that i received for my question.

Blocks consists of sectors.  A sector is 512 bytes.
SO whenever we insert a unique value, it hashes differently and hence goes to a new AMP.
On each of the AMP a sector (which is 512 bytes) is allocated.
If i had added same values .ie 'ABC' again then it would go again to same amp and in this case he size wont increase by 512 bytes as it will be accommodated in the same sector that was earlier allocated.

Tuesday 3 January 2012

How to Restart a fastload job?

The Restart process depends upon the phase in which the fastload job has failed.
Fast load has two phases - PHase1(acquisition phase) and Phase 2(application phase).

1) If the job fails in 1st phase and you want to continue the job after fixing the issue then you need to resubmit the job and it will start from last check point.

2) If job fails in 1st phase and we dont want to continue and just need to release lock then just submit the END LOADING statement or DROP-CREATE the table.

3) If job fails in 2nd phase and you want to continue the job then resubmit the job and it will continue loading data.

4) If the job fails in 2nd phase and we don't want to continue then we dont have option of submitting 'END LOADING'. Only way out is to DROP and RECREATE the table.


Dropping and recreating the table should not be an issue as fast load only works with empty tables and hence no chance of losing data.

What is difference betwen COUNT(1) and COUNT(*)?

There is NO difference between COUNT(1) and COUNT(*).
Both will return same number of rows.

One point to note is that count(columnname) is an aggregate function and does not take nulls into consideration.

So if table TABLE1 with one column COLUMN1 has 4 records and two of them have null values for COLUMN1 then SELECT COUNT(COLUMN1) will only return 2 and not 4.

  • Count(*) counts all occurrences including nulls. This is a row count.
  • Count (1) is like count(columns1) it will count all null values in a "pseudo column" created by the constant. Since the constant 1 will never be null, this would, like count(*), give you a row count.

COUNT(*) is row count and COUNT(1) is count of  constant pseudo column which cannot be null and hence in both cases the result will be same.



How to combine date and time column to form a timestamp field.

The most efficient way to combine a date and a time is to cast the date to a timestamp and add the time. But you can’t simply add a time, because TIME is a point in time and not a duration. So you have to transform it into an interval:

cast(Date_column as TimeStamp(6))
+ ((Time_column - time '00:00:00') hour to second(6)) 
 
Example:
  SELECT CAST(DATE'1999-12-30' AS TIMESTAMP(6))
+ ((TIME '12:12:21' - TIME '00:00:00') HOUR TO SECOND(6))
gives:
 (1999-12-30+(12:12:21 - 00:00:00) HOUR TO SECOND)
 1999-12-30 12:12:21.000000
 
We can also confirm the data type of the value returned by using TYPE

  SELECT TYPE(CAST(DATE'1999-12-30' AS TIMESTAMP(6))
+ ((TIME '12:12:21' - TIME '00:00:00') HOUR TO SECOND(6)))
gives
 Type((1999-12-30+(12:12:21 - 00:00:00) HOUR TO SECOND))
 TIMESTAMP(6)

Main point to be noted here is that we cant simply add TIME.
First we need to convert it to interval and then only add to timestamp.
(courtesy-Teradata forum)

Working with Teradata Time and Timstamp in non conventional format.

Working with TIME and TIME STAMP in non conventional format is a big headache.
Below mentioned simple technique will ease this.

1) For TIME

'08-09-17PM' (time, format 'HH-MI-SST')
Here we explicitly specify the format 

2) For Timestamp

'12/25/1994 11:46:29PM' (timestamp, format 'MM/DD/YYYYBHH:MI:SST')

Teradata -Finding 1st and last day of month using SQL

Most of the time our queries need to have filtering conditions to fetch data only for a month.
In such cases its very imp to find the first and last day of the month.
Below its shown how to find 1st and last date of a given month.


1) 1st day:

select datecol - extract(day from datecol) + 1

This subtracts the number of days since the beginning of the month, taking you to "day 0", or the day before the first of the month; then adds 1.

2) Last day:

select add_months((datecol - extract(day from datecol)+1),1)-1

Same idea, but this computes the first day of the following month and then subtracts 1.

Teradata: Inseting date literals or comparing with date literals

One of the most difficult task when working with dates in teradata is to insert date literals or comparing with date literals.
Below i have shown some very basic ways to work with date literals.

Assume that we have a table with date field defined as below:

CREATE VOLATILE TABLE ABC, NO FALLBACK
 (
 INPUTDATE DATE FORMAT 'MMMDDYYYY'
 )
  ON COMMIT PRESERVE ROWS


Following are the following ways to insert date literals or strings to this table:

1) Use the same format in which column is defined in the table.
Here the date column format is MMMDDYYY.
So a  query like below will work:

  INSERT INTO ABC VALUES ('JAN122011'); /*THE FORMAT OF THE LITERAL IS SAME AS THAT DEFINED IN THE TABLE*/
  SELECT * FROM ABC;

2) Explicitly specify the format in which the date string exists.
This comes to use especially when we are loading data through mload, where date data comes in as char or varchar.

  INSERT INTO ABC VALUES ('31JAN2010' (DATE,FORMAT'DDMMMYYYY')); /* NOTE THAT HERE WE EXPLICITLY SPECIFY WHAT FORMAT THE DATE STRING IS IN*/
  SELECT * FROM ABC;

3) Using ANSI standard form
As we know the ANSI standard date format is YYYY-MM-DD.
So we can use below query to insert a row.


  INSERT INTO ABC VALUES ( DATE '1941-12-12')/* NOTE THAT WE HAVE TO WRITE DATE I ANSI FORMAT AND ALSO USE THE KEYWORD DATE IN FRONT OF IT*/
  SELECT * FROM ABC

4) Using numeric format

As we know Date is internally stored as integer. That is why the space required by both Date and integer is 4 bytes each.
So we can directly insert a numeric value as well.

INSERT INTO ABC VALUES (411207);

Date inserted will be 7th dec 1941.

All the above also stands true while comparing a date column with date string also.

*********************************************************************************









Is there any way to confirm that database time zone automatically with change in DST?

The Daily Savings Time automatic adjustment feature is available via the Time Zone (TZ) string in 13.10 only.

Sunday 1 January 2012

How to know all Table and View definitions used in a query?

This can be done using the command SHOW.
All we need to do is to put a SHOW command infront of the SQL command.

SHOW SELECT * from VIEW1;

Above statement will give all the view definitions and table DDL involved in the above query.


How to write BTEQ output to a CSV file?

Write the SQL query as
SELECT COL1||','||COL2||','||COL3||','||.....||','||COLN FROM TABLE_NAME;


OR


. SET FORMAT ON;
.set separator ',';

Command before the SQL command in your BTEQ EXPORT script


How to use ARCMAIN to archive and restore partitions for PPI table?



For ARCHIVE:

     ARCHIVE DATA TABLES
     (db_old.tablename) 
     ( PARTITIONS WHERE (! x IN(x_106) !)),
      RELEASE LOCK, 
     FILE=ARCHIVEFILE;
     LOGOFF;


Archive command is used to archive Teradata tables. With PPI tables we have option of archiving only a particular partition.
IN above syntax we need to specify the table name to be archived. Notice the round brackets.
For PPI tables we need to specify the partition to be archived.
The command RELEASE LOCK is necessary because the ARC utility places a HUT(host utility Lock) of the table to be archived which needs to be released.
FILE indicates the file in which the archive needs to be stored.
Notice Semicolon ending the command.


For COPY:

     COPY DATA TABLES
     ( db_new.tablename_XXX  ) (from ( db_old.tablename )),
       RELEASE LOCK,
       FILE=ARCHIVE;
     LOGOFF;

db_new.tablename_XXX is the name of the name of the new table being created. This may be needed if we are restoring the data on same database system as the original table. In that case we need to provide the new table name.
FROM is used to specify which table from the archive file. A single archive may contain multiple tables being archived.
RELEASE LOCK is used to release the HUT lock.
FILE is used to indicate the file which contains the archived tables.
If it into same tablename and there no other tables in archive job not required to give 'from' in COPY syntax.
     COPY DATA TABLES
     ( db_new.tablename ),
     RELEASE LOCK,
     FILE=ARCHIVE;
     LOGOFF;


Additional Examples:

for Archive:
     .logon TDPID/username,passwd;
     ARCHIVE DATA TABLES
     (Students.orders_PPI_M)
     (PARTITIONS WHERE (! o_orderdate = DATE '2000-01-25' !)),
     RELEASE LOCK,
     FILE=ARCHIVE;
     .logoff;
For Copy:
     .logon TDPID/username,passwd;
     COPY DATA TABLES
     (Students.orders_rst )   (from(Students.orders_PPI_M )),
     RELEASE LOCK,
     FILE=ARCHIVE;
     .LOGOFF;

How to Restart BTEQ from the query it failed, thus skipping all the earlier successful queries?

1) Alternative 1
Create an empty file before you run the bteq(or we can use OS command to touch a empty file)
And write the bteq script as below:

     
     .logon tdpid/username,password;
     .OS if [ -f bteq_restart_step.txt ]; then else {touch bteq_restart_step.txt};
     .run file bteq_restart_step.txt;

     .label sql1;
     DML statement1 

     .if errorcode <> 0 then .goto EXIT_ERROR;
     .if errorcode = 0 then .os echo '.goto sql2' > bteq_restart_step.txt;

     .label sql2;   
     DML statement2
.if errorcode <> 0 then .goto EXIT_ERROR; 
     .if errorcode = 0 then .os echo '.goto sql3' >bteq_restart_step.txt;

     .label sql3;
     DML statement3

     .if errorcode <> 0 then .goto EXIT_ERROR; 
     .if errorcode = 0 then .os echo '.goto sql4' >  bteq_restart_step.txt;

     .label sql4;
     DML statement4

     .os rm -f bteq_restart_step.txt;
     .quit 0;

     .label EXIT_ERROR;
     .quit 99;

If the script completes sql1 and fails in the statement sql2 then the the file bteq_restart_step.txt will contain '.GOTO SQL2'. So when we restart the script it will run the '.run file' command and jump to the previously failed step.

2) Alternative 2
Create a wrapper script that will accept step number from the user and then change the bteq_restart_step.txt with appropriate stepnumber from where it has to be started.
 
3) Alternative 3

Create a multistatement request as follows:

DML statement1
;DML statement2
;DML statement3

This will create a transaction and hence it will either run completely or wont run itself.
In this case you wont have to worry about the restart. Just submit the script again.

What is a Two phase commit 2PC in teradata?

Two phase Commit comes to picture when there are multiple database systems involved in a transaction .
Example : Teradata and DB2.
Assume a transaction that requires inserts into tables each in Teradata and DB2. If is fails in any of the database systems then it should not get committed in the other Db as well.This is where 2PC is used.

In 2PC one system will act as a coordinator while other will act is as participants.
When a transaction occurs requests are sent to each of the participants.
Depending upon whether each participant is able to commit changes, it votes to the coordinator.
Each participant will check for constraints, access violations etc and then will vote its status.
Even if one participant is not able to commit the changes then the transaction will be rolled back from each of the Database systems.

Only when each participant gives a OK vote transaction will not get commited.
This is how 2PC works.

The duration between participant voting its status and coordinator confirming the commit, the participant is said to be in doubt.

Why does not FastLoad allow duplicate rows?

The Reason behind Fastload not allowing Duplicate rows is the Restart Logic.
If  Fastload fails due to some reason and we restart the utility then it starts sending rows from the last checkpoint.
FastLoad doesn't have/store any information about the input record sequence like MultiLoad's Match Tag (ApplySeq+DMLSeq+ImportSeq+SMTSeq+SourceSeq).
So when its restarted simply doesn't know, if a row was duplicate within the data or was sent twice because of a restarted FastLoad (in Application Phase).


If FastLoad would be able to load MultiSet like MLoad, there would be more overhead regarding perm space.


Anoher line of thought is that FastLoad is older than MultiSet tables and there's no reason to add that feature as long as there's Mload.

How to Cast a String Date to TimeStamp?

When we are dealing with string you need to cast the date string to date and afterwards the date to a timestamp.


Example :


select cast(cast('2011-07-11' as date) as timestamp(0));

Note that above assumes  'YYYY-MM-DD' date format.

If the date string has a different format then try the following

select cast(('2011/07/11' (date, format'YYYY/DD/MM')) as timestamp(0))
where you can explicit specify your date format...
Additional Examples:

     Insert into dt
     Select cast(cast('2011/07/11' as date ) as timestamp(6) );

     Insert into dt
     Select cast(cast('2011/07/11' as date format 'yyyy/mm/dd') as timestamp(6));