Sampling Data
The goal of data sampling is to take a smaller, more manageable sample of a much larger data set. With a sample data set, you can produce predictive models or use it to help you tune your database. The following example shows how you can use the TABLESAMPLE
clause to create a sample of your data.
Sampling Data from a Table
Before you begin the example, make sure that you have loaded the Machine Learning sample data.
Using the baseball
table, create a new table named baseball_sample
containing a 25% sample of baseball
. Remember, TABLESAMPLE does not guarantee that the exact percentage of records defined in the clause are returned.
=> CREATE TABLE baseball_sample AS SELECT * FROM baseball TABLESAMPLE(25); CREATE TABLE => SELECT * FROM baseball_sample; id | first_name | last_name | dob | team | hr | hits | avg | salary -----+------------+------------+------------+------------+-----+-------+-------+------------- 4 | Amanda | Turner | 1997-12-22 | Maroon | 58 | 177 | 0.187 | 8047721 20 | Jesse | Cooper | 1983-04-13 | Yellow | 97 | 39 | 0.523 | 4252837 22 | Randy | Peterson | 1980-05-28 | Orange | 14 | 16 | 0.141 | 11827728.1 24 | Carol | Harris | 1991-04-02 | Fuscia | 96 | 12 | 0.456 | 40572253.6 32 | Rose | Morrison | 1977-07-26 | Goldenrod | 27 | 153 | 0.442 | 14510752.49 50 | Helen | Medina | 1987-12-26 | Maroon | 12 | 150 | 0.54 | 32169267.91 70 | Richard | Gilbert | 1983-07-13 | Khaki | 1 | 250 | 0.213 | 40518422.76 81 | Angela | Cole | 1991-08-16 | Violet | 87 | 136 | 0.706 | 42875181.51 82 | Elizabeth | Foster | 1994-04-30 | Indigo | 46 | 163 | 0.481 | 33896975.53 98 | Philip | Gardner | 1992-05-06 | Puce | 39 | 239 | 0.697 | 20967480.67 102 | Ernest | Freeman | 1983-10-05 | Turquoise | 46 | 77 | 0.564 | 21444463.92 . . . (227 rows)
With your sample you can create a predictive model, or tune your database.
See Also
- FROM Clause (for more information about the
TABLESAMPLE
clause) - Building a Linear Regression Model
- Building a Logistic Regression Model
- Clustering Data Using k-means