
Vertica can divide the work of loading data, taking advantage of parallelism to speed up the operation. One supported type of parallelism is called apportioned load.
An apportioned load divides a single large file or other single source into segments (portions), which are assigned to several nodes to be loaded in parallel.
Example:
I want to load a data file that contains 100,000,000 records.
dbadmin=> \! wc -l /home/dbadmin/big_data.txt
100000000 /home/dbadmin/big_data.txt
For my first load attempt, I’ll load the file from a single node in my 3 node cluster.
dbadmin=> \timing
Timing is on.
dbadmin=> COPY big_data FROM '/home/dbadmin/big_data.txt' DIRECT;
Rows Loaded
-------------
100000000
(1 row)
Time: First fetch (1 row): 49078.222 ms. All rows formatted: 49078.268 ms
Next I will re-run the load, but this time include the “ON ANY NODE” option of the COPY command so that Vertica performs an apportioned load.
dbadmin=> COPY big_data FROM '/home/dbadmin/big_data.txt' ON ANY NODE DIRECT;
Rows Loaded
-------------
100000000
(1 row)
Time: First fetch (1 row): 21141.006 ms. All rows formatted: 21141.045 ms
Wow! An apportioned load executed over twice as fast as a single node load!
dbadmin=> SELECT 100 - (21141.006 / 49078.222 * 100) || '%' PCT_FASTER;
PCT_FASTER
---------------------
56.923855146993700%
(1 row)
Helpful link:https://my.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/ExtendingVertica/UDx/UDL/ApportionedLoad.htm
Have fun!