COPY Examples

For additional examples, see the reference pages for specific parsers: DELIMITED (Parser), ORC (Parser), and PARQUET (Parser).

Specifying String Options

Use COPY with FORMAT, DELIMITER, NULL, and ENCLOSED BY options:

=> COPY public.customer_dimension (customer_since FORMAT 'YYYY')
   FROM STDIN
   DELIMITER ','
   NULL AS 'null'
   ENCLOSED BY '"';					

Use COPY with DELIMITER and NULL options. This example sets and references a vsql variable input_file:

=> \set input_file ../myCopyFromLocal/large_table.gzip
=> COPY store.store_dimension
   FROM :input_file 
   DELIMITER '|' 
   NULL ''
   RECORD TERMINATOR E'\f';

Including Multiple Source Files

Create table sampletab, and then copy multiple source files to the table:

=> CREATE TABLE sampletab (a int);
CREATE TABLE
=> COPY sampletab FROM '/home/dbadmin/one.dat', 'home/dbadmin/two.dat';
 Rows Loaded
-------------
           2
(1 row)	

Use wildcards to indicate a group of files:

=> COPY myTable FROM 'webhdfs:///mydirectory/ofmanyfiles/*.dat';

Wildcards can include regular expressions:

=> COPY myTable FROM 'webhdfs:///mydirectory/*_[0-9]';

Specify multiple paths in a single COPY statement:

=> COPY myTable FROM 'webhdfs:///data/sales/01/*.dat', 'webhdfs:///data/sales/02/*.dat', 
	'webhdfs:///data/sales/historical.dat';

Distributing a Load

Load data that is shared across all nodes. Vertica distributes the load across all nodes, if possible:

=> COPY sampletab FROM '/data/file.dat' ON ANY NODE;

Load data from two files. Because the first load file does not specify nodes (or ON ANY NODE), the initiator performs the load. Loading the second file is distributed across all nodes.

=> COPY sampletab FROM '/data/file1.dat', '/data/file2.dat' ON ANY NODE;

Specify different nodes for each load file. COPY specifies to distribute the load of two files as follows:

  • file1.dat on nodes v_vmart_node0001 and v_vmart_node0002
  • file2.dat on nodes v_vmart_node0003 and v_vmart_node0004
=> COPY sampletab FROM '/data/file1.dat' ON (v_vmart_node0001, v_vmart_node0002), 
	'/data/file2.dat' ON (v_vmart_node0003, v_vmart_node0004);

Loading Data from Shared Storage

To load data from HDFS, S3, Google Cloud, or Azure, use URLs in the corresponding schemes, webhdfs:///path, s3://bucket/path, gs://bucket/path, and azb://bucket/path, respectively.

Loads from HDFS, S3, GCS, and Azure default to ON ANY NODE; you do not need to specify it.

Load a file stored in HDFS using the default name node or name service. See HDFS File System for more information about HDFS URLs.

=> COPY t FROM 'webhdfs:///opt/data/file1.dat';

Load data from a particular HDFS name service (testNS). You specify a name service if your database is configured to read from more than one HDFS cluster.

=> COPY t FROM 'webhdfs://testNS/opt/data/file2.csv';

Load data from an S3 bucket. See S3 Object Store for more information.

=> COPY t FROM 's3://AWS_DataLake/*' ORC;

Loading Hadoop Native Formats

Load data in the ORC format from HDFS:

=> COPY t FROM 'webhdfs:///opt/data/sales.orc' ORC;

Load Parquet data from an S3 bucket:

=> COPY t FROM 's3://AWS_DataLake/sales.parquet' PARQUET;

Using Filler Columns

In the following example, the table has columns for first name, last name, and full name, but the data being loaded contains columns for first, middle, and last names. The COPY statement reads all of the source data but only loads the source columns for first and last names. It constructs the data for the full name by concatenating each of the source data columns, including the middle name. The middle name is read as a FILLER column so it can be used in the concatenation, but is ignored otherwise. (There is no table column for middle name.)

=> CREATE TABLE names(first VARCHAR(20), last VARCHAR(20), full VARCHAR(60));
CREATE TABLE
=> COPY names(first,
              middle FILLER VARCHAR(20),
              last,
              full AS first||' '||middle||' '||last)
      FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Marc|Gregory|Smith
>> Sue|Lucia|Temp
>> Jon|Pete|Hamilton
>> \.
=> SELECT * from names;
 first |   last   |        full        
-------+----------+--------------------
 Jon   | Hamilton | Jon Pete Hamilton
 Marc  | Smith    | Marc Gregory Smith
 Sue   | Temp     | Sue Lucia Temp
(3 rows)

Loading Data into a Flex Table

Create a Flex table and copy JSON data into it, using the flex table parser fjsonparser:

=> CREATE FLEX TABLE darkdata();
CREATE TABLE
=> COPY tweets FROM '/myTest/Flexible/DATA/tweets_12.json' parser fjsonparser();
 Rows Loaded
-------------
          12
(1 row)

Using Named Pipes

COPY supports named pipes that follow the same naming conventions as file names on the given file system. Permissions are open, write, and close.

Create named pipe pipe1 and set two vsql variables, dir and file:

=> \! mkfifo  pipe1
=> \set dir `pwd`/ 
=> \set file '''':dir'pipe1'''

Copy an uncompressed file from the named pipe:

=> \! cat pf1.dat > pipe1 &
=> COPY large_tbl FROM :file delimiter '|'; 
=> SELECT * FROM large_tbl; 
=> COMMIT;

Loading Compressed Data

Copy a GZIP file from a named pipe and uncompress it:

=> \! gzip pf1.dat
=> \! cat pf1.dat.gz > pipe1 &
=> COPY large_tbl FROM :file ON site01 GZIP delimiter '|'; 
=> SELECT * FROM large_tbl; 
=> COMMIT;
=> \!gunzip pf1.dat.gz

Copy a BZIP file from a named pipe and then uncompress it:

=> \! bzip2 pf1.dat
=> \! cat pf1.dat.bz2 > pipe1 &
=> COPY large_tbl FROM :file ON site01 BZIP delimiter '|'; 
=> SELECT * FROM large_tbl; 
=> COMMIT;
=> \! bunzip2 pf1.dat.bz2

Copy an LZO file from a named pipe and then uncompress it:

=> \! lzop pf1.dat
=> \! cat pf1.dat.lzo > pipe1 &
=> COPY large_tbl FROM :file ON site01 LZO delimiter '|'; 
=> SELECT * FROM large_tbl; 
=> COMMIT;
=> \! lzop -d pf1.dat.lzo