Blog Archive

Sunday, 14 April 2013

Teradata RANDOM function, INDEX Function, POSITION Function



Random Function:

Random function can be used to generate random numbers between specified range.

Syntax: RANDOM (lower-limit, upper-limit).

The two numbers provided must be integers. Decimals are not allowed.
Numbers should be greater than 1.


SELECT EMPLOYEEID, random(1,10) FROM employee;

Employeeid
Random(1,10)
8
7
2
8
6
6
4
6
12
10
1
7
3
4
10
10
7
3
11
7
9
10
5
4

Random function is invoked for each row processed, thus duplicate are possible .
We can decrease the chances of getting duplicates by increasing the size of the pool (by increasing the random interval)


Rules for using Random functions:

  1. RANDOM can be used in a SELECT list or WHERE clause, but not in both

SELECT EMPLOYEEID, random(1,10) FROM employee where random(1,10) > 5;

SELECT Failed. 5522:  The RANDOM function can not be used in more than one SELECT query clauses. 

  1. RANDOM is not ANSI standard.

  1. RANDOM cannot be referenced by numeric position in a GROUP BY or ORDER BY clause




Creating Samples using RANDOM :

We can get samples using SAMPLE .
Say we need 50% sample of row having salary lesser than 10000 bucks.
We could use SAMPLE as shown below:

SELECT * FROM EMPLOYEE WHERE SALARY < 10000 SAMPLE .5;

Another way to get a 50 % sample using RANDOM is as follows:

SELECT * FROM EMPLOYEE WHERE SALARY < 10000 AND RANDOM(1,2) < 2;

However this technique is approximate and does not always give 50% sample.
If you run the same query multiple times you may get varying number of rows.

The reason being the random interval is very small.

To get a better consistent result we should use a bigger Interval as shown below

SELECT * FROM EMPLOYEE WHERE SALARY < 10000 AND RANDOM(1,1000) < 501;


Problem of using sample is that it can use used once in the query. But random can be used multiple times in a the same select and hence we can produce multiple samples , each having a different criteria.


SELECT * FROM EMPLOYEE WHERE
 (SALARY > 50000 AND RANDOM(1,1000) < 501)
OR

 (SALARY > 30000 AND SALARY < 50000 AND RANDOM(1,1000) < 701)

OR

 (SALARY > 5000 AND SALARY < 30000 AND RANDOM(1,1000) < 301)
;

Above query gives is
50% sample of employees having salary > 50000
70% sample of employees having salary > 30000 and salary < 50000
30% sample of employees having salary > 5000 and salary < 30000





Locating string position within a string using-  POSITION and INDEX functions:

Both the functions POSITION and INDEX are used for finding the position of a string with in a string.

POSITION is ANSI standard.
INDEX is Teradata Standard.

Both the functions need to arguments- One is the string to search in and other is the string to be searched.

Syntax:

SELECT POSITION('GOD' IN 'LIKE A GOD');  --> Result is 8

SELECT INDEX('GOD','LIKE A GOD');  --> Result is 8

Care needs to be taken when using these functions in ANSI mode. Text comparisons are case sensitive in ANSI mode.

No comments:

Post a Comment