Vertica Analytics Platform Version 9.2.x Documentation

COPY Examples

The following examples show how to load data with the COPY statement using various string options.

The FORMAT, DELIMITER, NULL, and ENCLOSED BY options:

=> COPY public.customer_dimension (customer_since FORMAT 'YYYY')
   FROM STDIN
   DELIMITER ','
   NULL AS 'null'
   ENCLOSED BY '"';					

The DELIMITER, NULL, and DIRECT options:

=> COPY a
   FROM STDIN 
   DELIMITER ',' 
   NULL E'\\\N'
   DIRECT;

The DELIMITER and NULL options:

=> COPY store.store_dimension
   FROM :input_file 
   DELIMITER '|' 
   NULL ''
   RECORD TERMINATOR E'\f';

Setting vsql Variables

The first two examples load data from STDIN. The last example uses a vsql variable (input_file). You can set a vsql variable as follows:

=> \set input_file ../myCopyFromLocal/large_table.gzip

Including Multiple Source Files

COPY supports the inclusion of multiple source files in a single COPY statement.

The following example creates a table named 'sampletab.' It then copies multiple source files to the table using a single COPY statement.

=> CREATE TABLE sampletab (a int);
CREATE TABLE
=> COPY sampletab FROM '/home/dbadmin/one.dat', 'home/dbadmin/two.dat';
 Rows Loaded
-------------
           2
(1 row)	

You can use wildcards to indicate a group of files:

=> COPY myTable FROM 'hdfs:///mydirectory/ofmanyfiles/*.dat';

Wildcards can include regular expressions:

=> COPY myTable FROM 'hdfs:///mydirectory/*_[0-9]';

You can use multiple paths in a single COPY statement:

=> COPY myTable FROM 'hdfs:///data/sales/01/*.dat', 'hdfs:///data/sales/02/*.dat', 
	'hdfs:///data/sales/historical.dat';

Distributing a Load

The following example shows how you can load data that is shared across all nodes. Vertica distributes the load across all nodes, if possible.

=> COPY sampletab FROM '/data/file.dat' ON ANY NODE;

This example shows how to load data from two files. Because the first load file does not specify nodes (or ON ANY NODE), the initiator performs the load. Loading the second file is distributed across all nodes.

=> COPY sampletab FROM '/data/file1.dat', '/data/file2.dat' ON ANY NODE;

This example shows how to specify different nodes for each load file. Vertica distributes the load of file1.dat across v_vmart_node0001 and v_vmart_node0002 and distributes the load of file2.dat across v_vmart_node0003 and v_vmart_node0004.

=> COPY sampletab FROM '/data/file1.dat' ON (v_vmart_node0001, v_vmart_node0002), 
	'/data/file2.dat' ON (v_vmart_node0003, v_vmart_node0004);

ON ANY NODE is the default for loads from HDFS. You do not need to specify it.

Loading Data from Shared Storage

To load data from HDFS or S3, use URLs in the corresponding schemes—hdfs:///path or s3://bucket/path. Loads from HDFS and S3 default to ON ANY NODE; you do not need to specify it.

This example shows how you can load a file stored in HDFS using the default NameNode or nameservice. See Using HDFS URLs for more information about HDFS URLs.

=> COPY t FROM 'hdfs:///opt/data/file1.dat';

This example shows how you can load data from a particular HDFS name service (testNS). You specify a name service if your database is configured to read from more than one HDFS cluster.

=> COPY t FROM 'hdfs://testNS/opt/data/file2.csv';

This example shows how you can load data from an S3 bucket. See Loading from an S3 Bucket for more information.

=> COPY t FROM 's3://AWS_DataLake/*' ORC;

Loading Hadoop Native Formats

This example shows how you can load data in the ORC format from HDFS.

=> COPY t FROM 'hdfs:///opt/data/sales.orc' ORC;

This example shows how you can load Parquet data from an S3 bucket.

=> COPY t FROM 's3://AWS_DataLake/sales.parquet' PARQUET;

Loading Data into a Flex Table

This statement creates a Flex table, and copies JSON data into the table, using the flex table parser, fjsonparser:

=> CREATE FLEX TABLE darkdata();
CREATE TABLE
=> COPY tweets FROM '/myTest/Flexible/DATA/tweets_12.json' parser fjsonparser();
 Rows Loaded
-------------
          12
(1 row)

Using Named Pipes

COPY supports named pipes that follow the same naming conventions as file names on the given file system. Permissions are open, write, and close.

This statement creates the named pipe, pipe1, and sets two vsql variables, dir and file:

=> \! mkfifo  pipe1
=> \set dir `pwd`/ 
=> \set file '''':dir'pipe1'''

This statement copies an uncompressed file from the named pipe:

=> \! cat pf1.dat > pipe1 & 
=> COPY large_tbl FROM :file delimiter '|'; 
=> SELECT * FROM large_tbl; 
=> COMMIT;

Loading Compressed Data

This statement copies a GZIP file from a named pipe and uncompresses it:

=> \! gzip pf1.dat
=> \! cat pf1.dat.gz > pipe1 &
=> COPY large_tbl FROM :file ON site01 GZIP delimiter '|'; 
=> SELECT * FROM large_tbl; 
=> COMMIT;
=> \!gunzip pf1.dat.gz

This statement copies a BZIP file from a named pipe and then uncompresses it:

=> \! bzip2 pf1.dat
=> \! cat pf1.dat.bz2 > pipe1 &
=> COPY large_tbl FROM :file ON site01 BZIP delimiter '|'; 
=> SELECT * FROM large_tbl; 
=> COMMIT;
=> \! bunzip2 pf1.dat.bz2

This statement copies an LZO file from a named pipe and then uncompresses it:

=> \! lzop pf1.dat
=> \! cat pf1.dat.lzo > pipe1 &
=> COPY large_tbl FROM :file ON site01 LZO delimiter '|'; 
=> SELECT * FROM large_tbl; 
=> COMMIT;
=> \! lzop -d pf1.dat.lzo