Confirming if a Data Load was Apportioned: Quick Tip

Posted December 13, 2018 by Jim Knicely, Big Data Solutions Architect, Vertica

Jim Knicely authored this tip. When running a COPY command, a parser can use more than one database node to load a single input source in parallel. This approach is referred to as apportioned load and is often much faster than loading from a single node. One way of confirming that a data load took advantage of apportioning is to query the LOAD_SOURCES system table after running the COPY command. If the load was apportioned, you’ll get a list of nodes that participated! Example: [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "SELECT node_name, node_address FROM nodes;" Timing is off. node_name | node_address ----------------+---------------- v_mydb_node0001 | 192.168.61.227 v_mydb_node0002 | 192.168.61.228 v_mydb_node0003 | 192.168.61.229 (3 rows) [dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.227 “ls -lrt /home/dbadmin/data2.txt” -rw-r–r– 1 dbadmin verticadba 184500036 Dec 11 22:01 /home/dbadmin/data2.txt [dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.228 “ls -lrt /home/dbadmin/data2.txt” -rw-r–r– 1 dbadmin verticadba 184500036 Dec 11 22:03 /home/dbadmin/data2.txt [dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.229 “ls -lrt /home/dbadmin/data2.txt” -rw-r–r– 1 dbadmin verticadba 184500036 Dec 11 22:03 /home/dbadmin/data2.txt [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic “COPY big_fact2 FROM ‘/home/dbadmin/data2.txt’ ON ANY NODE; — This load will be apportioned” Timing is off. Rows Loaded ————- 1500000 (1 row) [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic “SELECT node_name, source_name, input_size FROM load_sources WHERE table_name = ‘big_fact2’;” Timing is off. node_name | source_name | input_size ———— —+————————-+———— v_mydb_node0001 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0001 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0002 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0002 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0003 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0003 | /home/dbadmin/data2.txt | 30750006 (6 rows) Note that the original file size on each node is 184,500,036 bytes. Vertica split the load up into 6 pieces each 30,750,006 bytes in size. [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "SELECT 184500036 / 30750006 apportioned_pieces;" Timing is off. apportioned_pieces ---------------------- 6.000000000000000000 (1 row) Helpful Links: https://www.vertica.com/docs/latest/HTML/Content/Authoring/ExtendingVertica/UDx/UDL/ApportionedLoad.htm https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/LOAD_SOURCES.htm Have fun!