Faster Data Loads with Apportioned Load: Quick Tip

Jim Knicely authored this tip.

Vertica can divide the work of loading data, taking advantage of parallelism to speed up the operation. One supported type of parallelism is called apportioned load.

An apportioned load divides a single large file or other single source into segments (portions), which are assigned to several nodes to be loaded in parallel.

Example:

I want to load a data file that contains 100,000,000 records.

dbadmin=> \! wc -l /home/dbadmin/big_data.txt
100000000 /home/dbadmin/big_data.txt

For my first load attempt, I’ll load the file from a single node in my 3 node cluster.

dbadmin=> \timing
Timing is on.

dbadmin=> COPY big_data FROM '/home/dbadmin/big_data.txt' DIRECT;
Rows Loaded
-------------
   100000000
(1 row)

Time: First fetch (1 row): 49078.222 ms. All rows formatted: 49078.268 ms

Next I will re-run the load, but this time include the “ON ANY NODE” option of the COPY command so that Vertica performs an apportioned load.

dbadmin=> COPY big_data FROM '/home/dbadmin/big_data.txt' ON ANY NODE DIRECT;
Rows Loaded
-------------
   100000000
(1 row)

Time: First fetch (1 row): 21141.006 ms. All rows formatted: 21141.045 ms

Wow! An apportioned load executed over twice as fast as a single node load!

dbadmin=> SELECT 100 - (21141.006 / 49078.222 * 100) || '%' PCT_FASTER;
     PCT_FASTER
---------------------
56.923855146993700%
(1 row)

Helpful link:

https://my.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/ExtendingVertica/UDx/UDL/ApportionedLoad.htm

Have fun!

About the Author

Phil Molea
Sr. Information Developer, Vertica

Phil developed technical documentation in the areas of security and diagnostics for the Vertica Analytics Platform that enabled companies to extract value from their data at the speed and scale they need to thrive in today’s economy.

(Sadly, Phil passed away recently. He will be missed.)

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Faster Data Loads with Apportioned Load: Quick Tip

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Faster Data Loads with Apportioned Load: Quick Tip

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Quick Tips Posts