Batch Inserts Using JDBC Prepared Statements

You can load batches of data into Vertica using prepared INSERT statements—server-side statements that you set up once, and then call repeatedly. You instantiate a member of the PreparedStatement class with a SQL statement that contains question mark placeholders for data. For example:

PreparedStatement pstmt = conn.prepareStatement(
                    "INSERT INTO customers(last, first, id) VALUES(?,?,?)");

You then set the parameters using data-type-specific methods on the PreparedStatement object, such as setString() and setInt(). Once your parameters are set, call the addBatch() method to add the row to the batch. When you have a complete batch of data ready, call the executeBatch() method to execute the insert batch.

Behind the scenes, the batch insert is converted into a COPY statement. When the connection's AutoCommit parameter is disabled, Vertica keeps the COPY statement open and uses it to load subsequent batches until the transaction is committed, the cursor is closed, or your application executes anything else (or executes any statement using another Statement or PreparedStatement object). Using a single COPY statement for multiple batch inserts makes loading data more efficient. If you are loading multiple batches, you should disable the AutoCommit property of the database to take advantage of this increased efficiency.

When performing batch inserts, experiment with various batch and row sizes to determine the settings that provide the best performance.

The following example demonstrates using a prepared statement to batch insert data.

import java.sql.*;
import java.util.Properties;
 
public class BatchInsertExample {
    public static void main(String[] args) {
        Properties myProp = new Properties();
        myProp.put("user", "ExampleUser");
        myProp.put("password", "password123");
		
	 //Set streamingBatchInsert to True to enable streaming mode for batch inserts.
	 //myProp.put("streamingBatchInsert", "True");
        
	 Connection conn;
        try {
            conn = DriverManager.getConnection(
                            "jdbc:vertica://VerticaHost:5433/ExampleDB",
                            myProp);
            // establish connection and make a table for the data.
            Statement stmt = conn.createStatement();
            
            
            // Set AutoCommit to false to allow Vertica to reuse the same
            // COPY statement
            conn.setAutoCommit(false);
            
            
            // Drop table and recreate.
            stmt.execute("DROP TABLE IF EXISTS customers CASCADE");
            stmt.execute("CREATE TABLE customers (CustID int, Last_Name"
                            + " char(50), First_Name char(50),Email char(50), "
                            + "Phone_Number char(12))");
            // Some dummy data to insert. 
            String[] firstNames = new String[] { "Anna", "Bill", "Cindy",
                            "Don", "Eric" };
            String[] lastNames = new String[] { "Allen", "Brown", "Chu",
                            "Dodd", "Estavez" };
            String[] emails = new String[] { "aang@example.com",
                            "b.brown@example.com", "cindy@example.com",
                            "d.d@example.com", "e.estavez@example.com" };
            String[] phoneNumbers = new String[] { "123-456-7890",
                            "555-444-3333", "555-867-5309",
                            "555-555-1212", "781-555-0000" };
            // Create the prepared statement
            PreparedStatement pstmt = conn.prepareStatement(
                            "INSERT INTO customers (CustID, Last_Name, " + 
                            "First_Name, Email, Phone_Number)" +
                            " VALUES(?,?,?,?,?)");
            // Add rows to a batch in a loop. Each iteration adds a
            // new row.
            for (int i = 0; i < firstNames.length; i++) {
                // Add each parameter to the row.
                pstmt.setInt(1, i + 1);
                pstmt.setString(2, lastNames[i]);
                pstmt.setString(3, firstNames[i]);
                pstmt.setString(4, emails[i]);
                pstmt.setString(5, phoneNumbers[i]);
                // Add row to the batch.
                pstmt.addBatch();
            }
	     
            try {
                // Batch is ready, execute it to insert the data
                pstmt.executeBatch();
            } catch (SQLException e) {
                System.out.println("Error message: " + e.getMessage());
                return; // Exit if there was an error
            }
            
            // Commit the transaction to close the COPY command
            conn.commit();
            
            
            // Print the resulting table.
            ResultSet rs = null;
            rs = stmt.executeQuery("SELECT CustID, First_Name, "
                            + "Last_Name FROM customers ORDER BY CustID");
            while (rs.next()) {
                System.out.println(rs.getInt(1) + " - "
                                + rs.getString(2).trim() + " "
                                + rs.getString(3).trim());
            }
            // Cleanup
            conn.close();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

The result of running the example code is:

1 - Anna Allen
2 - Bill Brown
3 - Cindy Chu
4 - Don Dodd
5 - Eric Estavez

Streaming Batch Inserts

By default, Vertica performs batch inserts by caching each row and inserting the cache when the user calls the executeBatch() method. Vertica also supports streaming batch inserts. A streaming batch insert adds a row to the database each time the user calls addBatch(). Streaming batch inserts improve database performance by allowing parallel processing and reducing memory demands.

Once you begin a streaming batch insert, you cannot make other JDBC calls that require client-server communication until you have executed the batch or closed or rolled back the connection.

To enable streaming batch inserts, set the streamingBatchInsert property to True. The preceding code sample includes a line enabling streamingBatchInsert mode. Remove the // comment marks to enable this line and activate streaming batch inserts.

The following table explains the various batch insert methods and how their behavior differs between default batch insert mode and streaming batch insert mode.

Method Default Batch Insert Behavior Streaming Batch Insert Behavior
addBatch() Adds a row to the row cache. Inserts a row into the database.
executeBatch() Adds the contents of the row cache to the database in a single action. Sends an end-of-batch message to the server and returns an array of integers indicating the success or failure of each addBatch() attempt.
clearBatch() Clears the row cache without inserting any rows. Not supported. Triggers an exception if used when streaming batch inserts are enabled.

Notes

  • Using the PreparedStatement.setFloat() method can cause rounding errors. If precision is important, use the .setDouble() method instead.
  • The PreparedStatement object caches the connection's AutoCommit property when the statement is prepared. Later changes to the AutoCommit property have no effect on the prepared statement.