Another Way to De-Duplicate Table Rows: Quick Tip

Three 3D arrows, different colors pointing in different directions

To remove duplicate table rows it is typically better (i.e. faster) to create a temporary table that contains all of the distinct rows from a table, drop the original table, then rename the temp table to the original table’s name.

Example:

dbadmin=> SELECT * FROM dups;
c1 | c2
----+----
  1 | A
  1 | A
  1 | A
  2 | B
  3 | C
  3 | C
  4 | D
(7 rows)

dbadmin=> CREATE TABLE dups_new LIKE dups INCLUDING PROJECTIONS;
CREATE TABLE

dbadmin=> INSERT /*+ DIRECT */ INTO dups_new SELECT DISTINCT * FROM dups;
OUTPUT
--------
      4
(1 row)

dbadmin=> DROP TABLE dups;
DROP TABLE

dbadmin=> ALTER TABLE dups_new RENAME TO dups;
ALTER TABLE

dbadmin=> SELECT * FROM dups;
c1 | c2
----+----
  1 | A
  2 | B
  3 | C
  4 | D
(4 rows)

The issue with that solution is that you’ll need to be sure that the original table grants are restored if they exist.

For smaller tables that have duplicate rows, here is another method to remove them that doesn’t involve creating a new table.

dbadmin=> SELECT * FROM dups2;
c1 | c2
----+----
  1 | A
  1 | A
  1 | A
  2 | B
  3 | C
  3 | C
  4 | D
(7 rows)

dbadmin=> INSERT /*+ DIRECT */ INTO dups2 SELECT DISTINCT c1, c2 FROM dups;
OUTPUT
--------
      4
(1 row)

dbadmin=> DELETE /*+ DIRECT */ FROM dups2 WHERE epoch IS NOT NULL;
OUTPUT
--------
      7
(1 row)

dbadmin=> SELECT * FROM dups2;
c1 | c2
----+----
  1 | A
  2 | B
  3 | C
  4 | D
(4 rows)

dbadmin=> COMMIT;
COMMIT

This method works because the hidden table EPOCH column is NULL for each row inserted until you issue a COMMIT statement.

Helpful Link:

https://www.vertica.com/kb/Understanding-Vertica-Epochs/Content/BestPractices/Understanding-Vertica-Epochs.htm

Have fun!

About the Author

James Knicely
Vertica Field Chief Technologist

I've had the privilege of working with many database technologies in my career. But after being introduced to Vertica in May of 2011 as a client, I was hooked on the new technology after witnessing a query run in milliseconds that had previously ran for hours on the legacy database we had in place. It was then that I knew I wanted to eventually join the Vertica team, and 4 years later I did! I am currently a Vertica evangelist and am ready to help you get on board! Please feel free to reach out to me with any questions you have about Vertica and make sure to follow my Vertica Quick Tips!

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Another Way to De-Duplicate Table Rows: Quick Tip

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Another Way to De-Duplicate Table Rows: Quick Tip

About the Author

Search The Blog

Explore Popular Topics

Subscribe For Email Updates

See More Quick Tips Posts