Using Parallel Load Streams

Vertica can divide the work of loading data, taking advantage of parallelism to speed up the operation. Vertica supports several types of parallelism:

See General Parameters for information about the configuration parameters.

Specifying Distributed File Loads

The COPY FROM statement provides several ways to distribute a file load.

You can direct individual files in a multi-file load to specific nodes, as in the following example of distributed load.

=> COPY t FROM '/data/file1.dat' ON v_vmart_node0001, '/data/file2.dat' ON v_vmart_node0002;

You can use globbing (wildcard expansion) to specify a group of files with the ON ANY NODE directive, as in the following example.

=> COPY t FROM '/data/*.dat' ON ANY NODE;

If you have a single file instead of a group of files, you can still, potentially, benefit from apportioned load. The file must be large enough to divide into portions at least equal to ApportionedFileMinimumPortionSizeKB in size. You must also use a parser that supports apportioned load. The delimited parser built into Vertica supports apportioned load, but other parsers might not.

The following example shows how you can load a single large file using multiple nodes.

=> COPY t FROM '/data/bigfile.dat' ON ANY NODE;

You can limit the nodes that participate in an apportioned load. Doing so is useful if you need to balance several concurrent loads. Vertica apportions each load individually; it does not account for other loads that might be in progress on those nodes. You can, therefore, potentially speed up your loads by managing apportioning yourself.

The following example shows how you can apportion loads on specific nodes.

=> COPY t FROM '/data/big1.dat' ON (v_vmart_node0001, v_vmart_node0002, v_vmart_node0003),
		'/data/big2.dat' ON (v_vmart_node0004, v_vmart_node0005);

Loaded files can be of different formats, such as BZIP, GZIP, and others. However, because file compression is a filter, you cannot use apportioned load for a compressed file.

Specifying Distributed Loads with Sources

You can also apportion loads using COPY WITH SOURCE. You can create sources and parsers with the User-Defined Load (UDL) API. If both the source and parser support apportioned load, and EnableApportionLoad is set, then Vertica attempts to divide the load among nodes.

The following example shows a load that you could apportion.

=> COPY t WITH SOURCE MySource() PARSER MyParser();

The built-in delimited parser supports apportioning, so you can use it with a user-defined source, as in the following example.

=> COPY t WITH SOURCE MySource();

Number of Load Streams

Although the number of files you can load is not restricted, the optimal number of load streams depends on several factors, including:

Using too many load streams can deplete or reduce system memory required for optimal query processing. See Best Practices for Managing Workload Resources for advice on configuring load streams.

 


Was this topic helpful?