Monday, 13 May 2013

Teradata Sampling-- SAMPLE WITH REPLACEMENT


Sample with Replacement:

Assume a table employee with just 14 rows in it.
Now if we execute the following query we will still get result of only 14 rows.

select        *
from        employee
sample         50;

Even though we have asked for sample of 50 rows , because the table has only 14 rows we will get only 14 rows in return. This is because SAMPLE by default does not allow duplication of the rows. Once the row is returned then it is taken out of the pool of rows.

However if we use the WITH REPLACEMENT option we get ability to sample the row and return it back to the pool of rows from which it came, making it candidate for samplings.
Ex:

select        *
from        employee
sample        with replacement 50;

Above query will return 50 rows even if it has to repeat the same rows again and again.

Note that here we write 'SAMPLE WITH REPLACEMENT 50' and not 'SAMPLE 50 WITH REPLACEMENT'.

We can combine stratified sampling and sampling with replacement as follows:

Ex 1:

Select        employeeid,departmentno,sampleid
From        employee2
SAMPLE WITH REPLACEMENT        
WHEN        departmentno=100 then .40
WHEN        departmentno=200 then .25
WHEN        departmentno=300 then .88
WHEN        departmentno=400 then .35


END        ;


Employeeid
DepartmentNo
SampleId
1
100
1
133
100
1
130
100
1
10
200
2
6
200
2
152
300
3
150
300
3
7
300
3
3
300
3
151
300
3
144
400
4
4
400
4
135
400
4


Ex 2:

Select        employeeid,departmentno,sampleid
From        employee2
SAMPLE WITH REPLACEMENT        
WHEN        departmentno=100 then 4,2
WHEN        departmentno=300 then 4,4
END        ;

Employeeid
DepartmentNo
SampleId
131
100
1
5
100
1
1
100
1
130
100
1
131
100
2
1
100
2
151
300
3
7
300
3
3
300
3
150
300
3
150
300
4
3
300
4
151
300
4
7
300
4

We can see that multiple rows are duplicated across samples.
Same row appears in sample 3 and sample 4.


No comments:

Post a Comment