Using COPY LOCAL to Load Data in Perl
If you have delimited files (for example, a file with comma-separated values) on your client system that you want to load into Vertica, you can use the COPY LOCAL statement to directly load the file's contents into Vertica instead of using Perl to read, parse, and then batch insert the data. You execute a COPY LOCAL statement to load the file from the local filesystem. The result of executing the statement is the number of rows that were successfully inserted.
The following example code demonstrates loading a file named data.txt and located in the same directory as the Perl file into Vertica using a COPY LOCAL statement.
#!/usr/bin/perl use strict; use DBI; # Filesystem path handling module use File::Spec; # Create a hash reference that holds a hash of parameters for the # connection. my $attr = {AutoCommit => 0}; # Turn off AutoCommit # Open a connection using a DSN. Supply the username and password. my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123", $attr) or die "Failed to connect: $DBI::errstr"; print "Connected!\n"; # Drop any existing table. $dbh->do("DROP TABLE IF EXISTS Customers CASCADE;"); # Create a table to hold data. $dbh->do("CREATE TABLE Customers( \ ID INT, \ FirstName VARCHAR(100),\ LastName VARCHAR(100),\ Email VARCHAR(100),\ Birthday DATE)"); # Find the absolute path to the data file located in the current working # directory and named data.txt my $currDir = File::Spec->rel2abs(File::Spec->curdir()); my $dataFile = File::Spec->catfile($currDir, 'data.txt'); print "Loading file $dataFile\n"; # Load local file using copy local. Return value is the # of rows affected # which equates to the number of rows inserted. my $rows = $dbh->do("COPY Customers FROM LOCAL '$dataFile' DIRECT") or die $dbh->errstr; print "Copied $rows rows into database.\n"; $dbh->commit(); # Prepare a query to get the first 15 rows of the results my $sth = $dbh->prepare("SELECT * FROM Customers WHERE ID < 15 \ ORDER BY ID"); $sth->execute() or die "Error querying table: " . $dbh->errstr; my @row; # Pre-declare variable to hold result row used in format statement. # Use Perl formats to pretty print the output. Declare the heading for the # form. format STDOUT_TOP = ID First Last EMail Birthday == ===== ==== ===== ======== . # The Perl write statement will output a formatted line with values from the # @row array. See http://perldoc.perl.org/perlform.html for details. format STDOUT = @> @<<<<<<<<<<<<< @<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<< @row . # Loop through result rows while we have them while (@row = $sth->fetchrow_array()) { write; # Format command does the work of extracting the columns from # the @row array and writing them out to STDOUT. } # Call commit to prevent Perl from complaining about uncommitted transactions # when disconnecting $dbh->commit(); $dbh->disconnect();
The data.txt file is a text file containing a row of data on each line. The columns are delimited by pipe (|) characters. This is the default format that the COPY command accepts, which makes the COPY LOCAL statement in the example code simple. See the COPY statement entry in the SQL Reference Manual for handling data files that are in different formats. Here is an example of the content in this file:
1|Georgia|Gomez|Rhiannon@magna.us|1937-10-03 2|Abdul|Alexander|Kathleen@ipsum.gov|1941-03-10 3|Nigel|Contreras|Tanner@et.com|1955-06-01 4|Gray|Holt|Thomas@Integer.us|1945-12-06 5|Candace|Bullock|Scott@vitae.gov|1932-05-27 6|Matthew|Dotson|Keith@Cras.com|1956-09-30 7|Haviva|Hopper|Morgan@porttitor.edu|1975-05-10 8|Stewart|Sweeney|Rhonda@lectus.us|2003-06-20 9|Allen|Rogers|Alexander@enim.gov|2006-06-17 10|Trevor|Dillon|Eagan@id.org|1988-11-27 11|Leroy|Ashley|Carter@turpis.edu|1958-07-25 12|Elmo|Malone|Carla@enim.edu|1978-08-29 13|Laurel|Ball|Zelenia@Integer.us|1989-09-20 14|Zeus|Phillips|Branden@blandit.gov|1996-08-08 15|Alexis|Mclean|Flavia@Suspendisse.org|2008-01-07
The example code produces the following output when run on a large sample file:
Connected! Loading file /home/dbadmin/Perl/data.txt Copied 1000000 rows into database. ID First Last EMail Birthday == ===== ==== ===== ======== 1 Georgia Gomez Rhiannon@magna.us 1937-10-03 2 Abdul Alexander Kathleen@ipsum.gov 1941-03-10 3 Nigel Contreras Tanner@et.com 1955-06-01 4 Gray Holt Thomas@Integer.us 1945-12-06 5 Candace Bullock Scott@vitae.gov 1932-05-27 6 Matthew Dotson Keith@Cras.com 1956-09-30 7 Haviva Hopper Morgan@porttitor.edu 1975-05-10 8 Stewart Sweeney Rhonda@lectus.us 2003-06-20 9 Allen Rogers Alexander@enim.gov 2006-06-17 10 Trevor Dillon Eagan@id.org 1988-11-27 11 Leroy Ashley Carter@turpis.edu 1958-07-25 12 Elmo Malone Carla@enim.edu 1978-08-29 13 Laurel Ball Zelenia@Integer.us 1989-09-20 14 Zeus Phillips Branden@blandit.gov 1996-08-08
Note: Loading a single, large data file into Vertica through a single data connection is less efficient than loading a number of smaller files onto multiple nodes in parallel. Loading onto multiple nodes prevents any one node from becoming a bottleneck.