Sampling : SAMPLE and
SAMPLEID
- The
SAMPLE function is used to generate samples of data from a table or
view.
- This can be done in two ways:
- Sample n where n is integer. It will return n number of rows.(Actual number of rows).
If the number n is greater than the number of rows in the table,
the sample will consist of the number of rows in the table.
- Sample n where n is a fraction. That will return that much fraction of the table.(.ie Percentage of the table). n is a decimal value less than 1.00 and greater than .00
Note that calculations resulting in fractional rows must be
greater than .5 to actually return a row.
Example: The table employee has 12 rows. .25 sample means 4.2
rows. As the fraction value is not above .5 only 4 rows will be returned.
A .49 samele means 5.88 rows. As .88 is greater than .5 sample
of .49 will return 6 rows.
- Rows are not reused within the same sample.
If table has 12 rows . A query like following will return only
12 rows as table has only 12 rows.
select * from employee sample 15;
- Check the query as below
Select count(distinct departmentNo) from employee sample 5;
We might think that the above query is creating a sample of 5
records and the find number of distinct
department numbers in the sample. However in actual all the rows in the table
are considered and number of distinct departments as 4.
To produce correct result we would have to use a derived table
as shown below:
Select count(distinct A. departmentno) from
(Select * from employee sample 5) A;
This query gives result as 3 which is the count of distinct
department numbers within the sample.
If we run the same query again and again the result may vary as
the sample output may vary every time.
- Multiple sample sets may be generated in a single query if desired.
In order to identify the specific set a tag called the SAMPLEID is made available for association with each set.
The SAMPLEID may be selected, used for ordering, or used as a
column in a new table.
Note that When using
multiple sets rows are not repeated to different sample sets.
SAMPLEID is a keyword that assigns a sample-id to a specific
sample set.
Example:
Select EMPLOYEEID, SAMPLEID from employee sample .5,.25
Employeeid
|
SampleId
|
5
|
1
|
10
|
1
|
11
|
1
|
12
|
1
|
1
|
1
|
6
|
1
|
2
|
2
|
4
|
2
|
7
|
2
|
The sampleid indicates which sample the record belongs to. Here
6 rows belong to sample 1 and 3 rows belong to sample 2.
If the sum of sample increases beyond 1 then it results in an
error.
Select EMPLOYEEID, SAMPLEID from
employee sample .5,.6;
Error message: SELECT Failed. 5473: SAMPLE clause has
invalid set of arguments.
As mentioned earlier rows are not repeated with in different
samples, even if the there are not enough rows to return
The employee table has only 12 rows. Following is the result
when we run the below query
Select EMPLOYEEID, SAMPLEID from
employee sample 6,5,4;
Employeeid
|
SampleId
|
5
|
1
|
3
|
1
|
10
|
1
|
4
|
1
|
2
|
1
|
1
|
1
|
11
|
2
|
9
|
2
|
7
|
2
|
6
|
2
|
8
|
2
|
12
|
3
|
We get only 12 rows as output as there are no enough rows in the
table. Note that the last sample does not enough rows to return . Hence we only
have one row with sampleid 3.
"If the number n is greater than the number of rows in the table, the sample will consist of the number of rows in the table."
ReplyDeletepls revisit this comment