Bulk Loading Using the COPY Statement
One of the fastest ways to load large amounts of data into Vertica at once (bulk loading) is to use the COPY statement. This statement loads data from a file stored on a Vertica host (or in a data stream) into a table in the database. You can pass the COPY statement parameters that define the format of the data in the file, how the data is to be transformed as it is loaded, how to handle errors, and how the data should be loaded. See the COPY documentation in the SQL Reference Manual for details.
One parameter that is particularly important is the DIRECT option, which tells COPY to load the data directly into ROS rather than going through the WOS. You should use this option when you are loading large files (over 100MB) into the database. Without this option, your load may fill the WOS and overflow into ROS, requiring the Tuple Mover to perform a Moveout on the data in the WOS. It is more efficient to directly load into ROS and avoid forcing a moveout.
Only a superuser can use the COPY statement to copy a file stored on a host, so you must connect to the database using a superuser account. If you want to have a non-superuser user bulk-load data, you can use COPY to load from a stream on the host (such as STDIN) rather than a file or stream data from the client (see Streaming Data Via JDBC). You can also perform a standard batch insert using a prepared statement, which uses the COPY statement in the background to load the data.
Note: When using this COPY parameter on any node
, confirm that the source file is identical on all nodes. Using different files can produce inconsistent results.
The following example demonstrates using the COPY statement through the JDBC to load a file name customers.txt
into a new database table. This file must be stored on the database host to which your application connects (in this example, a host named VerticaHost). Since the customers.txt
file used in the example is very large, this example uses the DIRECT option to bypass WOS and load directly into ROS.
import java.sql.*; import java.util.Properties; import com.vertica.jdbc.*; public class COPYFromFile { public static void main(String[] args) { Properties myProp = new Properties(); myProp.put("user", "ExampleAdmin"); // Must be superuser myProp.put("password", "password123"); Connection conn; try { conn = DriverManager.getConnection( "jdbc:vertica://VerticaHost:5433/ExampleDB",myProp); // Disable AutoCommit conn.setAutoCommit(false); Statement stmt = conn.createStatement(); // Create a table to hold data. stmt.execute("DROP TABLE IF EXISTS customers;"); stmt.execute("CREATE TABLE IF NOT EXISTS customers (Last_Name char(50) " + "NOT NULL, First_Name char(50),Email char(50), " + "Phone_Number char(15))"); // Use the COPY command to load data. Load directly into ROS, since // this load could be over 100MB. Use ENFORCELENGTH to reject // strings too wide for their columns. boolean result = stmt.execute("COPY customers FROM " + " '/data/customers.txt' DIRECT ENFORCELENGTH"); // Determine if execution returned a count value, or a full result // set. if (result) { System.out.println("Got result set"); } else { // Count will usually return the count of rows inserted. System.out.println("Got count"); int rowCount = stmt.getUpdateCount(); System.out.println("Number of accepted rows = " + rowCount); } // Commit the data load conn.commit(); } catch (SQLException e) { System.out.print("Error: "); System.out.println(e.toString()); } } }
The example prints the following out to the system console when run (assuming that the customers.txt
file contained two million valid rows):
Number of accepted rows = 2000000