COPY Examples
For additional examples, see the reference pages for specific parsers: DELIMITED (Parser), ORC (Parser), and PARQUET (Parser).
Specifying String Options
Use COPY with FORMAT
, DELIMITER
, NULL
, and ENCLOSED BY
options:
=> COPY public.customer_dimension (customer_since FORMAT 'YYYY') FROM STDIN DELIMITER ',' NULL AS 'null' ENCLOSED BY '"';
Use COPY with DELIMITER
and NULL
options. This example sets and references a vsql variable input_file
:
=> \set input_file ../myCopyFromLocal/large_table.gzip => COPY store.store_dimension FROM :input_file DELIMITER '|' NULL '' RECORD TERMINATOR E'\f';
Including Multiple Source Files
Create table sampletab
, and then copy multiple source files to the table:
=> CREATE TABLE sampletab (a int); CREATE TABLE => COPY sampletab FROM '/home/dbadmin/one.dat', 'home/dbadmin/two.dat'; Rows Loaded ------------- 2 (1 row)
Use wildcards to indicate a group of files:
=> COPY myTable FROM 'webhdfs:///mydirectory/ofmanyfiles/*.dat';
Wildcards can include regular expressions:
=> COPY myTable FROM 'webhdfs:///mydirectory/*_[0-9]';
Specify multiple paths in a single COPY statement:
=> COPY myTable FROM 'webhdfs:///data/sales/01/*.dat', 'webhdfs:///data/sales/02/*.dat', 'webhdfs:///data/sales/historical.dat';
Distributing a Load
Load data that is shared across all nodes. Vertica distributes the load across all nodes, if possible:
=> COPY sampletab FROM '/data/file.dat' ON ANY NODE;
Load data from two files. Because the first load file does not specify nodes (or ON ANY NODE), the initiator performs the load. Loading the second file is distributed across all nodes.
=> COPY sampletab FROM '/data/file1.dat', '/data/file2.dat' ON ANY NODE;
Specify different nodes for each load file. COPY specifies to distribute the load of two files as follows:
file1.dat
on nodesv_vmart_node0001
andv_vmart_node0002
file2.dat
on nodesv_vmart_node0003
andv_vmart_node0004
=> COPY sampletab FROM '/data/file1.dat' ON (v_vmart_node0001, v_vmart_node0002), '/data/file2.dat' ON (v_vmart_node0003, v_vmart_node0004);
Loading Data from Shared Storage
To load data from HDFS, S3, Google Cloud, or Azure, use URLs in the corresponding schemes, webhdfs:///path
, s3://bucket/path
, gs://bucket/path
, and azb://bucket/path
, respectively.
Loads from HDFS, S3, GCS, and Azure default to ON ANY NODE
; you do not need to specify it.
Load a file stored in HDFS using the default name node or name service. See HDFS File System for more information about HDFS URLs.
=> COPY t FROM 'webhdfs:///opt/data/file1.dat';
Load data from a particular HDFS name service (testNS). You specify a name service if your database is configured to read from more than one HDFS cluster.
=> COPY t FROM 'webhdfs://testNS/opt/data/file2.csv';
Load data from an S3 bucket. See S3 Object Store for more information.
=> COPY t FROM 's3://AWS_DataLake/*' ORC;
Loading Hadoop Native Formats
Load data in the ORC format from HDFS:
=> COPY t FROM 'webhdfs:///opt/data/sales.orc' ORC;
Load Parquet data from an S3 bucket:
=> COPY t FROM 's3://AWS_DataLake/sales.parquet' PARQUET;
Using Filler Columns
In the following example, the table has columns for first name, last name, and full name, but the data being loaded contains columns for first, middle, and last names. The COPY statement reads all of the source data but only loads the source columns for first and last names. It constructs the data for the full name by concatenating each of the source data columns, including the middle name. The middle name is read as a FILLER column so it can be used in the concatenation, but is ignored otherwise. (There is no table column for middle name.)
=> CREATE TABLE names(first VARCHAR(20), last VARCHAR(20), full VARCHAR(60)); CREATE TABLE => COPY names(first, middle FILLER VARCHAR(20), last, full AS first||' '||middle||' '||last) FROM STDIN; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> Marc|Gregory|Smith >> Sue|Lucia|Temp >> Jon|Pete|Hamilton >> \. => SELECT * from names; first | last | full -------+----------+-------------------- Jon | Hamilton | Jon Pete Hamilton Marc | Smith | Marc Gregory Smith Sue | Temp | Sue Lucia Temp (3 rows)
Loading Data into a Flex Table
Create a Flex table and copy JSON data into it, using the flex table parser fjsonparser
:
=> CREATE FLEX TABLE darkdata(); CREATE TABLE => COPY tweets FROM '/myTest/Flexible/DATA/tweets_12.json' parser fjsonparser(); Rows Loaded ------------- 12 (1 row)
Using Named Pipes
COPY supports named pipes that follow the same naming conventions as file names on the given file system. Permissions are open
, write
, and close
.
Create named pipe pipe1
and set two vsql variables, dir
and file
:
=> \! mkfifo pipe1 => \set dir `pwd`/ => \set file '''':dir'pipe1'''
Copy an uncompressed file from the named pipe:
=> \! cat pf1.dat > pipe1 & => COPY large_tbl FROM :file delimiter '|'; => SELECT * FROM large_tbl; => COMMIT;
Loading Compressed Data
Copy a GZIP file from a named pipe and uncompress it:
=> \! gzip pf1.dat => \! cat pf1.dat.gz > pipe1 & => COPY large_tbl FROM :file ON site01 GZIP delimiter '|'; => SELECT * FROM large_tbl; => COMMIT; => \!gunzip pf1.dat.gz
Copy a BZIP file from a named pipe and then uncompress it:
=> \! bzip2 pf1.dat => \! cat pf1.dat.bz2 > pipe1 & => COPY large_tbl FROM :file ON site01 BZIP delimiter '|'; => SELECT * FROM large_tbl; => COMMIT; => \! bunzip2 pf1.dat.bz2
Copy an LZO file from a named pipe and then uncompress it:
=> \! lzop pf1.dat => \! cat pf1.dat.lzo > pipe1 & => COPY large_tbl FROM :file ON site01 LZO delimiter '|'; => SELECT * FROM large_tbl; => COMMIT; => \! lzop -d pf1.dat.lzo