Using the COPY Statement
The COPY statement lets you bulk load data from a file on stored on a database node into the Vertica database. This method is the most efficient way to load data into Vertica because the file resides on the database server. One drawback is that only a database superuser can use COPY, since it requires privilege in order to access the filesystem of the database node.
One drawback of using COPY instead of performing batch loads is that you can only get results of the load (the number of accepted and rejected rows) when the COPY statement has finished. With batch loads, you can monitor the progress as batches are inserted. The ability to monitor the progress of a load can be a useful feature if you want to stop loading if a large portion of the data is being rejected.
A primary concern when bulk loading data using COPY is deciding whether the data should be loaded directly into ROS using the DIRECT option, or by using the AUTO method (loading into WOS until it fills, then loading into ROS). You should load directly into the ROS when your transaction will load a large (more than 100MB of data or so) amount of data.
Note: The exceptions/rejections files are created on the client machine when the exceptions and rejected data modifiers are specified on the COPY command. Specify a local path and filename for these modifiers when executing a COPY query from the driver.
The following example loads data into the WOS (Write Optimized Store) until it fills, then stores additional data directly in ROS (Read Optimized Store).
ret=SQLExecDirect(hdlStmt, (SQLCHAR*)"COPY customers " "FROM '/data/customers.txt' AUTO",SQL_NTS);
The following example loads data into the ROS (Read Optimized Store.
ret=SQLExecDirect(hdlStmt, (SQLCHAR*)"COPY customers " "FROM '/data/customers.txt' DIRECT",SQL_NTS);
See the COPY statement in the SQL Reference Manual for more information about its syntax and use.
The following example demonstrates using the COPY command.
// Some standard headers #include <stdio.h> #include <stdlib.h> // Only needed for Windows clients // #include <windows.h> // Standard ODBC headers #include <sql.h> #include <sqltypes.h> #include <sqlext.h> // Helper function to determine if an ODBC function call returned // successfully. bool notSuccess(SQLRETURN ret) { return (ret != SQL_SUCCESS && ret != SQL_SUCCESS_WITH_INFO); } int main() { // Set up the ODBC environment SQLRETURN ret; SQLHENV hdlEnv; ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv); if(notSuccess(ret)) { printf("Could not allocate a handle.\n"); exit(EXIT_FAILURE); } else { printf("Allocated an environment handle.\n"); } // Tell ODBC that the application uses ODBC 3. ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION, (SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER); if(notSuccess(ret)) { printf("Could not set application version to ODBC3.\n"); exit(EXIT_FAILURE); } else { printf("Set application to ODBC 3.\n"); } // Allocate a database handle. SQLHDBC hdlDbc; ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc); // Connect to the database printf("Connecting to database.\n"); const char *dsnName = "ExampleDB"; // Note: User MUST be a database superuser to be able to access files on the // filesystem of the node. const char* userID = "dbadmin"; const char* passwd = "password123"; ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName, SQL_NTS,(SQLCHAR*)userID,SQL_NTS, (SQLCHAR*)passwd, SQL_NTS); if(notSuccess(ret)) { printf("Could not connect to database.\n"); exit(EXIT_FAILURE); } else { printf("Connected to database.\n"); } // Disable AUTOCOMMIT printf("Disabling autocommit.\n"); ret = SQLSetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF, SQL_NTS); if(notSuccess(ret)) { printf("Could not disable autocommit.\n"); exit(EXIT_FAILURE); } // Set up a statement handle SQLHSTMT hdlStmt; SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt); // Create table to hold the data SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers", SQL_NTS); SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers" "(Last_Name char(50) NOT NULL, First_Name char(50),Email char(50), " "Phone_Number char(15));", SQL_NTS); // Run the copy command to load data into ROS. ret=SQLExecDirect(hdlStmt, (SQLCHAR*)"COPY customers " "FROM '/data/customers.txt' DIRECT", SQL_NTS); if(notSuccess(ret)) { printf("Data was not successfully loaded.\n"); exit(EXIT_FAILURE); } else { // Get number of rows added. SQLLEN numRows; ret=SQLRowCount(hdlStmt, &numRows); printf("Successfully inserted %d rows.\n", numRows); } // Done with batches, commit the transaction printf("Committing transaction\n"); ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT); if(notSuccess(ret)) { printf("Could not commit transaction\n"); } else { printf("Committed transaction\n"); } // Clean up printf("Free handles.\n"); SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt); SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc); SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv); exit(EXIT_SUCCESS); }
The example prints the following when run:
Allocated an environment handle. Set application to ODBC 3. Connecting to database. Connected to database. Disabling autocommit. Successfully inserted 10001 rows. Committing transaction Committed transaction Free handles.