Filtering Data While Loading into Vertica

Posted March 22, 2017 by Sarah Lemaire, Manager, Vertica Documentation

Suppose you have a CSV file and you want to copy some, but not all, of its contents into a Vertica table. There are two ways you can to do this:
• Use the SKIP keyword with COPY.
• Use the head or tail Linux command.
Let’s see how this works.

The Data

Here’s a small example data set, a csv file named, students.csv, that contains the following information for 20 students: Smith, John, 1977, 1981, Bowers Hall Brown, Tim, 1978, 1982, Blue Hall Howe, Mary, 1976, 1980, White Hall Kelly, Sarah, 1977, 1981, Blue Hall Jones, William, 1977, 1981, Johnson Hall Brady, Mark, 1976, 1980, Madison Hall Black, Howard, 1975, 1979, Bowers Hall King, Martha, 1976, 1980, White Hall McCoy, Keith, 1978, 1982, Madison Hall McDonald, Susan, 1977, 1981, Johnson Hall Williams, Joe, 1975, 1979, Bowers Hall Johnson, Julie, 1978, 1982, White Hall Barry, Polly, 1976, 1980, Blue Hall Wilson, Martin, 1975, 1979, Johnson Hall Adams, Harry, 1976, 1980, White Hall Ball, Terry, 1977, 1981, Bowers Hall Chan, Ty, 1978, 1982, Johnson Hall David, Donald, 1975, 1979, Madison Hall Emory, Fletcher, 1976, 1980, White Hall Flaherty, Finbarr, 1977 1981, Blue Hall To load this data into Vertica, define a table as follows: => CREATE TABLE students ( Last VARCHAR(64), First VARCHAR(64), EnrollYear INT, GradYear INT, Dorm VARCHAR(64) ); Now you can load the data from the .csv file into the students table: => COPY students FROM '/home/dbadmin/students.csv' DELIMITER ','; Rows loaded ------------- 20 (1 row) => SELECT * FROM students; Last | First | EnrollYear | GradYear | Dorm ----------+-----------+------------+----------+--------------- Howe | Mary | 1976 | 1980 | White Hall King | Martha | 1976 | 1980 | White Hall McCoy | Keith | 1978 | 1982 | Madison Hall Williams | Joe | 1975 | 1979 | Bowers Hall Barry | Polly | 1976 | 1980 | Blue Hall Emory | Fletcher | 1976 | 1980 | White Hall Flaherty | Finbarr | 1977 | 1981 | Blue Hall Jones | William | 1977 | 1981 | Johnson Hall Brady | Mark | 1976 | 1980 | Madison Hall Black | Howard | 1975 | 1979 | Bowers Hall Wilson | Martin | 1975 | 1979 | Johnson Hall David | Donald | 1975 | 1979 | Madison Hall Smith | John | 1977 | 1981 | Bowers Hall Brown | Tim | 1978 | 1982 | Blue Hall Kelly | Sarah | 1977 | 1981 | Blue Hall McDonald | Susan | 1977 | 1981 | Johnson Hall Johnson | Julie | 1978 | 1982 | White Hall Adams | Harry | 1976 | 1980 | White Hall Ball | Terry | 1977 | 1981 | Bowers Hall Chan | Ty | 1978 | 1982 | Johnson Hall (20 rows)

Using COPY to Filter Rows

Let’s create a new table and copy the same table except for the first 10 entries. To omit the first 10 entries in the CSV file, use the SKIP keyword. We’ll name the new table students_last_ten: => CREATE TABLE students_last_ten ( Last VARCHAR(64), First VARCHAR(64), EnrollYear INT, GradYear INT, Dorm VARCHAR(64) ); => COPY students_last_ten FROM '/home/dbadmin/students.csv' DELIMITER ',' SKIP 10; Rows Loaded ------------- 10 (1 row) => SELECT * FROM students_last_ten; Last | First | EnrollYear | GradYear | Dorm ----------+-----------+------------+----------+------------- Williams | Joe | 1975 | 1979 | Bowers Hall Barry | Polly | 1976 | 1980 | Blue Hall Emory | Fletcher | 1976 | 1980 | White Hall Flaherty | Finbarr | 1977 | 1981 | Blue Hall Johnson | Julie | 1978 | 1982 | White Hall Adams | Harry | 1976 | 1980 | White Hall Ball | Terry | 1977 | 1981 | Bowers Hall Chan | Ty | 1978 | 1982 | Johnson Hall Wilson | Martin | 1975 | 1979 | Johnson Hall David | Donald | 1975 | 1979 | Madison Hall (10 rows)

Using Linux to Filter Rows

You can also use the Linux head and tail commands to write certain rows from the top or the bottom of a file to a Vertica table. For this next example, we’ll load the first 10 rows from the student data. => CREATE TABLE students_last_ten ( Last VARCHAR(64), First VARCHAR(64), EnrollYear INT, GradYear INT, Dorm VARCHAR(64) ); CREATE TABLE => \q Let’s run the tail command to verify that it gives us the last 10 rows. Note that running the head command would give us the first 10 rows. $ tail -10 /home/dbadmin/students.csv Williams, Joe, 1975, 1979, Bowers Hall Johnson, Julie, 1978, 1982, White Hall Barry, Polly, 1976, 1980, Blue Hall Wilson, Martin, 1975, 1979, Johnson Hall Adams, Harry, 1976, 1980, White Hall Ball, Terry, 1977, 1981, Bowers Hall Chan, Ty, 1978, 1982, Johnson Hall David, Donald, 1975, 1979, Madison Hall Emory, Fletcher, 1976, 1980, White Hall Flaherty, Finbarr, 1977, 1981, Blue Hall Now we’ll rerun the tail command and pipe that output into the Vertica table we created. $ head -10 /home/dbadmin/students.csv | vsql -d vmart -U dbadmin -c "COPY students_last_ten FROM LOCAL STDIN DELIMITER ',' ;" Rows Loaded ------------- 10 (1 row) Let’s make sure the data was actually loaded into the student_last_ten table. $ vsql => SELECT * FROM students_last_ten; Last | First | EnrollYear | GradYear | Dorm ----------+----------+------------+----------+--------------- Black | Howard | 1975 | 1979 | Bowers Hall Brady | Mark | 1976 | 1980 | Madison Hall Jones | William | 1977 | 1981 | Johnson Hall Brown | Tim | 1978 | 1982 | Blue Hall Kelly | Sarah | 1977 | 1981 | Blue Hall McDonald | Susan | 1977 | 1981 | Johnson Hall Smith | John | 1977 | 1981 | Bowers Hall Howe | Mary | 1976 | 1980 | White Hall King | Martha | 1976 | 1980 | White Hall McCoy | Keith | 1978 | 1982 | Madison Hall (10 rows)

For More Information

COPY Parameters in the Vertica Documentation
head Man Page
tail Man Page