Showing posts with label Teradata RANDOM function. Show all posts
Showing posts with label Teradata RANDOM function. Show all posts

Sunday, 14 April 2013

Teradata RANDOM function, INDEX Function, POSITION Function



Random Function:

Random function can be used to generate random numbers between specified range.

Syntax: RANDOM (lower-limit, upper-limit).

The two numbers provided must be integers. Decimals are not allowed.
Numbers should be greater than 1.


SELECT EMPLOYEEID, random(1,10) FROM employee;

Employeeid
Random(1,10)
8
7
2
8
6
6
4
6
12
10
1
7
3
4
10
10
7
3
11
7
9
10
5
4

Random function is invoked for each row processed, thus duplicate are possible .
We can decrease the chances of getting duplicates by increasing the size of the pool (by increasing the random interval)


Rules for using Random functions:

  1. RANDOM can be used in a SELECT list or WHERE clause, but not in both

SELECT EMPLOYEEID, random(1,10) FROM employee where random(1,10) > 5;

SELECT Failed. 5522:  The RANDOM function can not be used in more than one SELECT query clauses. 

  1. RANDOM is not ANSI standard.

  1. RANDOM cannot be referenced by numeric position in a GROUP BY or ORDER BY clause




Creating Samples using RANDOM :

We can get samples using SAMPLE .
Say we need 50% sample of row having salary lesser than 10000 bucks.
We could use SAMPLE as shown below:

SELECT * FROM EMPLOYEE WHERE SALARY < 10000 SAMPLE .5;

Another way to get a 50 % sample using RANDOM is as follows:

SELECT * FROM EMPLOYEE WHERE SALARY < 10000 AND RANDOM(1,2) < 2;

However this technique is approximate and does not always give 50% sample.
If you run the same query multiple times you may get varying number of rows.

The reason being the random interval is very small.

To get a better consistent result we should use a bigger Interval as shown below

SELECT * FROM EMPLOYEE WHERE SALARY < 10000 AND RANDOM(1,1000) < 501;


Problem of using sample is that it can use used once in the query. But random can be used multiple times in a the same select and hence we can produce multiple samples , each having a different criteria.


SELECT * FROM EMPLOYEE WHERE
 (SALARY > 50000 AND RANDOM(1,1000) < 501)
OR

 (SALARY > 30000 AND SALARY < 50000 AND RANDOM(1,1000) < 701)

OR

 (SALARY > 5000 AND SALARY < 30000 AND RANDOM(1,1000) < 301)
;

Above query gives is
50% sample of employees having salary > 50000
70% sample of employees having salary > 30000 and salary < 50000
30% sample of employees having salary > 5000 and salary < 30000





Locating string position within a string using-  POSITION and INDEX functions:

Both the functions POSITION and INDEX are used for finding the position of a string with in a string.

POSITION is ANSI standard.
INDEX is Teradata Standard.

Both the functions need to arguments- One is the string to search in and other is the string to be searched.

Syntax:

SELECT POSITION('GOD' IN 'LIKE A GOD');  --> Result is 8

SELECT INDEX('GOD','LIKE A GOD');  --> Result is 8

Care needs to be taken when using these functions in ANSI mode. Text comparisons are case sensitive in ANSI mode.