COPY Examples
The following examples show how to load data with the COPY statement using various string options.
The FORMAT
, DELIMITER
, NULL
, and ENCLOSED BY
options:
=> COPY public.customer_dimension (customer_since FORMAT 'YYYY') FROM STDIN DELIMITER ',' NULL AS 'null' ENCLOSED BY '"';
The DELIMITER
, NULL
, and DIRECT
options:
=> COPY a FROM STDIN DELIMITER ',' NULL E'\\\N' DIRECT;
The DELIMITER
and NULL
options:
=> COPY store.store_dimension FROM :input_file DELIMITER '|' NULL '' RECORD TERMINATOR E'\f';
Setting vsql Variables
The first two examples load data from STDIN. The last example uses a vsql variable (input_file
). You can set a vsql variable as follows:
=> \set input_file ../myCopyFromLocal/large_table.gzip
Including Multiple Source Files
COPY supports the inclusion of multiple source files in a single COPY statement.
The following example creates a table named 'sampletab.' It then copies multiple source files to the table using a single COPY statement.
=> CREATE TABLE sampletab (a int); CREATE TABLE => COPY sampletab FROM '/home/dbadmin/one.dat', 'home/dbadmin/two.dat'; Rows Loaded ------------- 2 (1 row)
You can use wildcards to indicate a group of files:
=> COPY myTable FROM 'hdfs:///mydirectory/ofmanyfiles/*.dat';
Wildcards can include regular expressions:
=> COPY myTable FROM 'hdfs:///mydirectory/*_[0-9]';
You can use multiple paths in a single COPY statement:
=> COPY myTable FROM 'hdfs:///data/sales/01/*.dat', 'hdfs:///data/sales/02/*.dat', 'hdfs:///data/sales/historical.dat';
Distributing a Load
The following example shows how you can load data that is shared across all nodes. Vertica distributes the load across all nodes, if possible.
=> COPY sampletab FROM '/data/file.dat' ON ANY NODE;
This example shows how to load data from two files. Because the first load file does not specify nodes (or ON ANY NODE), the initiator performs the load. Loading the second file is distributed across all nodes.
=> COPY sampletab FROM '/data/file1.dat', '/data/file2.dat' ON ANY NODE;
This example shows how to specify different nodes for each load file. Vertica distributes the load of file1.dat across v_vmart_node0001 and v_vmart_node0002 and distributes the load of file2.dat across v_vmart_node0003 and v_vmart_node0004.
=> COPY sampletab FROM '/data/file1.dat' ON (v_vmart_node0001, v_vmart_node0002), '/data/file2.dat' ON (v_vmart_node0003, v_vmart_node0004);
ON ANY NODE is the default for loads from HDFS. You do not need to specify it.
Loading Data from Shared Storage
To load data from HDFS or S3, use URLs in the corresponding schemes—hdfs:///path or s3://bucket/path. Loads from HDFS and S3 default to ON ANY NODE; you do not need to specify it.
This example shows how you can load a file stored in HDFS using the default NameNode or nameservice. See Using HDFS URLs for more information about HDFS URLs.
=> COPY t FROM 'hdfs:///opt/data/file1.dat';
This example shows how you can load data from a particular HDFS name service (testNS). You specify a name service if your database is configured to read from more than one HDFS cluster.
=> COPY t FROM 'hdfs://testNS/opt/data/file2.csv';
This example shows how you can load data from an S3 bucket. See Loading from an S3 Bucket for more information.
=> COPY t FROM 's3://AWS_DataLake/*' ORC;
Loading Hadoop Native Formats
This example shows how you can load data in the ORC format from HDFS.
=> COPY t FROM 'hdfs:///opt/data/sales.orc' ORC;
This example shows how you can load Parquet data from an S3 bucket.
=> COPY t FROM 's3://AWS_DataLake/sales.parquet' PARQUET;
Loading Data into a Flex Table
This statement creates a Flex table, and copies JSON data into the table, using the flex table parser, fjsonparser
:
=> CREATE FLEX TABLE darkdata(); CREATE TABLE => COPY tweets FROM '/myTest/Flexible/DATA/tweets_12.json' parser fjsonparser(); Rows Loaded ------------- 12 (1 row)
Using Named Pipes
COPY supports named pipes that follow the same naming conventions as file names on the given file system. Permissions are open
, write
, and close
.
This statement creates the named pipe, pipe1, and sets two vsql variables, dir
and file
:
=> \! mkfifo pipe1 => \set dir `pwd`/ => \set file '''':dir'pipe1'''
This statement copies an uncompressed file from the named pipe:
=> \! cat pf1.dat > pipe1 & => COPY large_tbl FROM :file delimiter '|'; => SELECT * FROM large_tbl; => COMMIT;
Loading Compressed Data
This statement copies a GZIP file from a named pipe and uncompresses it:
=> \! gzip pf1.dat => \! cat pf1.dat.gz > pipe1 & => COPY large_tbl FROM :file ON site01 GZIP delimiter '|'; => SELECT * FROM large_tbl; => COMMIT; => \!gunzip pf1.dat.gz
This statement copies a BZIP file from a named pipe and then uncompresses it:
=> \! bzip2 pf1.dat => \! cat pf1.dat.bz2 > pipe1 & => COPY large_tbl FROM :file ON site01 BZIP delimiter '|'; => SELECT * FROM large_tbl; => COMMIT; => \! bunzip2 pf1.dat.bz2
This statement copies an LZO file from a named pipe and then uncompresses it:
=> \! lzop pf1.dat => \! cat pf1.dat.lzo > pipe1 & => COPY large_tbl FROM :file ON site01 LZO delimiter '|'; => SELECT * FROM large_tbl; => COMMIT; => \! lzop -d pf1.dat.lzo