Quick Tip: Use a “Seed” Table to Populate a Table with Random Values

Posted May 26, 2021 by James Knicely, Vertica Field Chief Technologist

Hand writing the text: Helpful Tips

Oftentimes, I need to do some testing with randomly generated data. For instance, if I need to insert a million rows I used to cross join several system tables with one another in an INSERT statement.

Method 1 (bad): CROSS JOIN

Suppose we need a table full of many random UUIDs. We could do this with UUID_GENERATE() and a CROSS JOIN:

verticademos=> CREATE TABLE generated_data (c UUID);
CREATE TABLE

verticademos=> INSERT INTO generated_data SELECT UUID_GENERATE() FROM tables CROSS JOIN columns;
OUTPUT
--------
    220
(1 row)

Unfortunately, this method lacks in precision and magnitude. First, we can’t predict the number of rows in the TABLES or COLUMNS system tables. Second, we only managed to insert 220 rows – what if we need a lot more?

Method 2 (better): “Seed” table

A better approach is to create a “seed” table with a bunch of rows.

verticademos=> CREATE TABLE seed AS SELECT ROW_NUMBER() OVER() c FROM (SELECT ts FROM (SELECT '01-JAN-0000'::TIMESTAMP AS tm UNION SELECT '31-DEC-2100'::TIMESTAMP as tm) AS t TIMESERIES ts AS '1 SECOND' OVER (ORDER BY tm) LIMIT 100000000) foo;
CREATE TABLE

verticademos=> SELECT COUNT(*), MIN(c), MAX(c) FROM seed;
   COUNT   | MIN |    MAX
-----------+-----+-----------
100000000 |   1 | 100000000
(1 row)

Now we can generate a lot more data in a single INSERT:

verticademos=> INSERT INTO generated_data SELECT uuid_generate() FROM seed;
  OUTPUT
-----------
100000000
(1 row)

verticademos=> SELECT * FROM generated_data LIMIT 5;
                  c
--------------------------------------
26b2863f-ab3b-405a-ab65-df4fa6469990
00001133-ec4b-4a99-8fc4-30b6e71cdeab
000047d9-0b54-4282-a55a-bc1176292a77
000049d6-329e-4475-81b7-194967bff2f2
00007f03-3996-4c7f-8332-d55fd9fc2249
(5 rows)

But what about precision? We can generate a specific number of rows by simply applying a LIMIT clause!

verticademos=> INSERT INTO generated_data SELECT uuid_generate() FROM seed LIMIT 50000;
OUTPUT
--------
  50000
(1 row)

verticademos=> INSERT INTO generated_data SELECT uuid_generate() FROM seed LIMIT 8675309;
  OUTPUT
----------
  8675309
(1 row)

Have fun!