Vertica Analytics Platform Version 9.2.x Documentation

COPY Parameters

For details on which COPY parameters are valid for specific parsers, see COPY Option Parser Dependencies.

[database.]schema

Specifies a schema, by default public. If schema is any schema other than public, you must supply the schema name. For example:

myschema.thisDbObject

If you specify a database, it must be the current database.

COPY ignores schema-name when used in CREATE EXTERNAL TABLE or CREATE FLEX EXTERNAL TABLE statements.

target‑table

The target columnar or flexible table for loading new data. Vertica loads the data into all projections that include columns from the schema table.

column‑as‑expression

Specifies the expression used to compute values for the target column. For example:

COPY t(year AS TO_CHAR(k, 'YYYY')) FROM 'myfile.dat' 

Use this option to transform data when it is loaded into the target database.

For details about:

column

Restricts the load to one or more specified columns in the table. If you omit specifying columns, COPY loads all columns by default.

Table columns that you omit from the column list are assigned their DEFAULT or SET USING values, if any; otherwise, COPY inserts NULL.

If you leave the column parameter blank to load all columns in the table, you can use the optional parameter COLUMN OPTION to specify parsing options for specific columns.

The data file must contain the same number of columns as the COPY command's column list. For example, in a table T1 with nine columns (C1 through C9), the following statement loads the three columns of data in each record to columns C1, C6, and C9, respectively:

=> COPY T1 (C1, C6, C9);
FILLER datatype

Indicates that COPY should not load the identified column into the destination table. Use this option to omit columns that you do not want to transfer into a table.

The name of the column being filtered must not match the name of a column in the table definition. The following is an error:

=> CREATE TABLE names (firstname VARCHAR, lastname VARCHAR);
--- error:
=> COPY names (firstname FILLER VARCHAR) FROM ...

This parameter also transforms data from a source column and loads the transformed data to the destination table, rather than loading the original, untransformed source column (parsed column).

If datatype is VARCHAR, set the VARCHAR length (VARCHAR(n) so the combined length of all FILLER source fields does not exceed the target column's defined length. Otherwise, the COPY command can return with an error.

For more information, see Manipulating Source Data Columns in the Administrator's Guide

FORMAT 'format'

Specifies the input formats to use when loading date/time and binary columns, where format can be one of the following:

These are the valid input formats when loading binary columns:

  • octal
  • hex
  • bitstream

See Loading Binary (Native) Data to learn more about these formats.

When loading date/time columns, using FORMAT significantly improves load performance. COPY supports the same formats as the TO_DATE function.

See the following topics for additional information:

If you specify invalid format strings, the COPY operation returns an error.

path‑to‑data

Specifies the absolute path of the file (or files) containing the data, which can be from multiple input sources. If the file is stored in HDFS, path‑to‑data is a URL in the hdfs scheme, typically hdfs:///path/to/file.  See Using HDFS URLs. If the file is stored in an S3 bucket, path‑to‑data is a URL in the format 's3://bucket/path'. If the file is stored in Google Cloud Storage, path‑to‑data is a URL in the format 'gs://bucket/path'. If the file is on the local disk or an NFS mount, path‑to‑data is a local (Linux) file path.

path‑to‑data can optionally contain wildcards to match more than one file. The file or files must be accessible to the local client or the host on which the COPY statement runs. COPY skips empty files in the file list. A file list that includes directories causes the query to fail. See Specifying COPY FROM Options. The supported patterns for wildcards are specified in the Linux Manual Page for Glob (7), and for ADO.net platforms, through the .NET Directory.getFiles Method.

You can use variables to construct the pathname as described in Using Load Scripts.

If path resolves to a storage location on a local file system, and the user invoking COPY is not a superuser, the following privileges apply:

  • The storage location must have been created with the USER option (see CREATE LOCATION).
  • The user must already have been granted READ access to the file storage location, as described in GRANT (Storage Location)

Further, if a user has privileges but is not a superuser, and invokes COPY from that storage location, Vertica ensures that symbolic links do not result in unauthorized access.

ON nodename

Specifies the node on which the data to copy resides and the node that should parse the load file. If you omit nodename, the location of the input file defaults to the COPY initiator node. Use nodename to copy and parse a load file from a node other than the COPY initiator node.

nodename is invalid with STDIN and LOCALSTDIN is read on the initiator node only, and LOCAL indicates a client node.

ON (nodeset)

Specifies a set of nodes on which to perform the load. The same data must be available for load on all named nodes. nodeset is a comma-separated list of node names in parentheses. For example:

=> COPY t FROM 'file1.txt' ON (v_vmart_node0001, v_vmart_node0002);

Vertica apportions the load among all of the specified nodes. If you also specify ERROR TOLERANCE or REJECTMAX, Vertica instead chooses a single node on which to perform the load.

If the data is available on all nodes, you usually use ON ANY NODE. However, you can use ON nodeset to do manual load-balancing among concurrent loads.

ON ANY NODE

Specifies that the source file to load is available on all nodes, so COPY opens the file and parses it from any node in the cluster.

The file must be the same on all nodes. If the file differs on two nodes, an incorrect or incomplete result is returned, with no error or warning.

Vertica attempts to apportion the load among several nodes if the file is large enough to benefit from apportioning. It chooses a single node if ERROR TOLERANCE or REJECTMAX is specified.

You can use a wildcard or glob (such as *.dat) to load multiple input files, combined with the ON ANY NODE clause. If you use a glob, COPY distributes the list of files to all cluster nodes and spreads the workload.

ON ANY NODE is invalid with STDIN and LOCAL. STDIN can only use the initiator node, and LOCAL indicates a client node.

ON ANY NODE is the default for HDFS and S3 paths and does not need to be specified.

STDIN

Reads from the client a standard input instead of a file. STDIN takes one input source only and is read on the initiator node. To load multiple input sources, use path‑to‑data.

User must have INSERT privileges on the table and USAGE privileges on its schema.

This parameter is not valid when creating external tables.

LOCAL {STDIN | 'path‑to‑data'}

Specifies that all paths for the COPY statement are on the client system and that all COPY variants are initiated from a client. You can use LOCAL and path‑to‑data (see above), to specify a relative path.

For details, see COPY LOCAL.

This parameter is not valid when creating external tables.

input‑format

Specifies the input format, one of the following:

  • UNCOMPRESSED (default)
  • BZIP
  • GZIP
  • LZO
  • ZSTD

Input files can be of any format. If you use wildcards, all qualifying input files must be in the same format. To load different file formats, specify the format types specifically.

The following requirements and restrictions apply:

  • When using concatenated BZIP or GZIP files, verify that all source files terminate with a record terminator before concatenating them.
  • Concatenated BZIP and GZIP files are not supported for NATIVE (binary) and NATIVE VARCHAR formats.
  • LZO files are assumed to be compressed with lzop. Vertica supports the following lzop arguments:

    ‑‑no-checksum / ‑F
    ‑‑crc32
    ‑‑adler32
    ‑‑no-name / ‑n
    ‑‑name / ‑N
    ‑‑no-mode
    ‑‑no-time
    ‑‑fast
    ‑‑best
    Numbered compression levels

  • BZIP, GZIP, ZSTD, and LZO compression cannot be used with ORC format.

parser

Specifies the parser to use when bulk loading columnar tables, one of the following:

By default, COPY uses the DELIMITER parser for UTF-8 format, delimited text input data.

To use a flex table parser for column tables, use the PARSER parameter followed by a flex table parser argument. For parser descriptions, see Flex Parsers Reference.

You do not specify the DELIMITER parser directly; absence of a specific parser indicates the default.

For more information, see Specifying a COPY Parser in the Administrator's Guide.

The following restrictions apply:

  • These parsers are not applicable when loading flexible tables.
  • The ORC and PARQUET parsers are for use with Hadoop files in those formats. The files do not need to be stored in HDFS. For details on using these formats, see Reading ORC and Parquet Formats.
  • COPY LOCAL does not support NATIVE and NATIVE VARCHAR parsers.
  • To use a flex table parser for column tables, use the PARSER parameter followed by a flex table parser argument. For supported flex table parsers, see Bulk Loading Data into Flex Tables.

[WITH] UDL‑clause[…]

Specifies one or more user-defined load functions—one source, one or more filters, and one parser, as follows:

  • SOURCE source( [arg=value[,…] ] )
  • FILTER filter( [arg=value[,…] ] )
  • PARSER parser( [arg=value[,…] ] )

To use a flex table parser for column tables, use the PARSER parameter followed by a flex table parser argument. For supported flex table parsers, see Bulk Loading Data into Flex Tables.

COLUMN OPTION

Specifies load metadata for one or more columns declared in the table column list. For example, you can specify that a column has its own DELIMITER, ENCLOSED BY, NULL as 'NULL' expression, and so on. You do not have to specify every column name explicitly in the COLUMN OPTION list, but each column you specify must correspond to a column in the table column list.

COLSIZES (integer[,…])

Specifies column widths when loading fixed-width data. COPY requires that you specify the COLSIZES when using the FIXEDWIDTH parser. COLSIZES and the list of integers must correspond to the columns listed in the table column list. For more information, see Loading Fixed-Width Format Data in the Administrator's Guide.

DELIMITER

Indicates the single ASCII character used to separate columns within each record of a file.You can use any ASCII value in the range E'\000' to E'\177', inclusive. You cannot use the same character for both the DELIMITER and NULL parameters. For more information, see Loading Delimited Data in the Administrator's Guide.

Default:  Vertical bar (|).

TRAILING NULLCOLS

Specifies that if Vertica encounters a record with insufficient data to match the columns in the table column list, COPY inserts the missing columns with NULLs. For other information and examples, see Loading Fixed-Width Format Data in the Administrator's Guide.

This option is not available for ORC or Parquet data.

ESCAPE [AS]

Sets the escape character. Once set, the character following the escape character is interpreted literally, rather than as a special character. You can define an escape character using any ASCII value in the range E'\001' to E'\177', inclusive (any ASCII character except NULL: E'\000'). Note that the backslash character ('\') is the default escape character.

The COPY statement does not interpret the data it reads in as String Literals.It also does not follow the same escape rules as other SQL statements (including the COPY parameters). When reading in data, COPY interprets only the characters defined by these options as special values:

  • ESCAPE [AS]
  • DELIMITER
  • ENCLOSED [BY]
  • RECORD TERMINATOR
NO ESCAPE

Eliminates escape character handling. Use this option if you do not need any escape character and you want to prevent characters in your data from being interpreted as escape sequences.

ENCLOSED [BY]

Sets the quote character within which to enclose data, allowing delimiter characters to be embedded in string values. You can choose any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000'). By default, ENCLOSED BY has no value, meaning data is not enclosed by any sort of quote character.

NULL

The string representing a null value. The default is an empty string (''). You can specify a null value as any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000'). You cannot use the same character for both the DELIMITER and NULL options. For more information, see Loading Delimited Data.

RECORD TERMINATOR

Specifies the literal character string indicating the end of a data file record. For more information about using this parameter, see Loading Delimited Data.

This option is not available for ORC or Parquet data.

SKIP records

Indicates the number (integer) of records to skip in a load file. For example, you can use the SKIP option to omit table header information.

This option is not available for ORC or Parquet data.

SKIP BYTES total

Indicates the total number (integer) of bytes in a record to skip. This option is only available when loading fixed-width data.

This option is not available for ORC or Parquet data.

TRIM

Trims the number of bytes you specify from a column. This option is only available when loading fixed-width data. You can set TRIM at the table level for a column, or as part of the COLUMN OPTION parameter.

This option is not available for ORC or Parquet data.

REJECTMAX

Specifies a maximum number of logical records that can be rejected before a load fails. For details, see Capturing Load Rejections and Exceptions.

REJECTMAX disables apportioned load.

REJECTED DATA

Specifies where to write each row that failed to load. If this parameter is specified, records that failed due to parsing errors are always written. Records that failed due to an error during a transformation are written only if the CopyFaultTolerantExpressions configuration parameter is set. (See General Parameters.)

The syntax for this parameter is:

REJECTED DATA 
 { 'path' [ ON nodename ] [,…] | AS TABLE reject-table }

Vertica can write rejected data to the specified path or to a table:

  • 'path' [ ON nodename ]: Copies the rejected row data to the specified path on the node executing the load. If qualified by ON nodename, Vertica moves existing rejected data files on nodename to path on the same node.

    The value of path can be a directory or a file prefix. If there are multiple load sources, path is always treated as a directory. If there are not multiple load sources but path ends with '/', or if a directory of that name already exists, it is also treated as a directory. Otherwise, path is treated as a file prefix.

    Files are written on the node or nodes executing the load. If the file already exists, it is overwritten.

    When this parameter is used with LOCAL, the output is written to the client.

  • REJECTED DATA AS TABLE reject-table: Saves rejected rows to reject‑table.

Do not use path with ON ANY NODE. To collect all rejected data in one place regardless of how the load is distributed, use a table.

For more information about both options, see Capturing Load Rejections and Exceptions in the Administrator's Guide.

EXCEPTIONS

Specifies the file name or absolute path of the file in which to write exceptions, as follows:

EXCEPTIONS 'path' [ ON nodename[,…]] 

Exceptions describe why each rejected row was rejected. Each exception describes the corresponding record in the file specified by the REJECTED DATA option.

Files are written on the node or nodes executing the load. If the file already exists, it is overwritten.

To collect all exceptions in one place, use the REJECTED DATA AS TABLE clause and exceptions are automatically listed in the table's rejected_reason column.

EXCEPTIONS is incompatible with REJECTED DATA AS TABLE.

The ON nodename clause moves existing exceptions files on nodename to the indicated path on the same node. For more information, see Saving Load Exceptions (EXCEPTIONS) in the Administrator's Guide.

If you use this parameter with COPY…ON ANY NODE, you must still specify the individual nodes for the exception files, as in the following example:

EXCEPTIONS '/home/ex01.txt' on v_db_node0001,'/home/ex02.txt' on v_db_node0002,'/home/ex03.txt' on v_db_node0003 

If path resolves to a storage location, the following privileges apply to non-superusers:

  • The storage location must be created with the USER option (see CREATE LOCATION).
  • The user must have READ access to the storage location where the files exist, as described in GRANT (Storage Location).
ENFORCELENGTH

Determines whether COPY truncates or rejects data rows of type char, varchar, binary, and varbinary if they do not fit the target table. Specifying the optional ENFORCELENGTH parameter rejects rows.

You can set ENFORCELENGTH at the table level for a column, or as part of the COLUMN OPTION parameter.

Default: COPY truncates offending rows of these data types, but does not reject them. For more details, see Tracking Load Exceptions and Rejections Status in the Administrator's Guide.

ERROR TOLERANCE

Specifies that COPY treats each source during execution independently when loading data. The statement is not rolled back if a single source is invalid. The invalid source is skipped and the load continues.

This option is not available for ORC or Parquet data.

Using this parameter disables apportioned load.

ABORT ON ERROR

Specifies that COPY stops if any row is rejected. The statement is rolled back and no data is loaded.

[STORAGE] load‑method

Specifies how to load data into the database, one of the following:

The STORAGE option is provided only to facilitate readability.

  • AUTO (default): Initially loads data into WOS, suitable for smaller bulk loads.
  • DIRECT: Loads data directly into ROS containers, suitable for large (>100 MB) bulk loads.
  • TRICKLE: Loads data only into WOS, suitable for frequent incremental loads.

This option is invalid for external tables.

For details, see Choosing a Load Method in the Administrator's Guide.

STREAM NAME

Supplies a COPY load stream identifier. Using a stream name helps to quickly identify a particular load. The STREAM NAME value that you supply in the load statement appears in the STREAM_NAME column of the LOAD_STREAMS and LOAD_SOURCES system tables.

A valid stream name can contain any combination of alphanumeric or special characters up to 128 bytes in length.

By default, Vertica names streams by table and file name. For example, if you are loading two files (f1, f2) into TableA, their default stream names are TableA-f1, TableA-f2, respectively.

To name a stream:

=> COPY mytable FROM myfile DELIMITER '|' DIRECT STREAM NAME 'My stream name';
NO COMMIT

Prevents the COPY statement from committing its transaction automatically when it finishes copying data.

The following requirements and restrictions apply:

For more information, see Overriding COPY Auto Commit.

This option is not available for ORC or Parquet data.