Hash Function Result Depends on Data Type and Sometimes Size

Posted July 18, 2019 by James Knicely, Vertica Field Chief Technologist

Vintage businessman concept pointing on the wall wearing futuristic helmet at office
The Vertica HASH function calculates a hash value over the function arguments, producing a value in the range 0 <= x < 2^63. When calculating a hash value, the HASH function takes into consideration the data type and size of the original value.

Be careful when comparing hashed column values from different tables where the columns have different data types and sizes. They may not have the same hashed value.

Example: dbadmin=> CREATE TABLE a (c INT); CREATE TABLE dbadmin=> INSERT INTO a SELECT 1; OUTPUT -------- 1 (1 row) dbadmin=> CREATE TABLE b ( c NUMERIC(10, 1)); CREATE TABLE dbadmin=> INSERT INTO b SELECT 1; OUTPUT -------- 1 (1 row) dbadmin=> SELECT COUNT(*) FROM a JOIN b ON b.c = a.c; COUNT ------- 1 (1 row) dbadmin=> SELECT COUNT(*) FROM a JOIN b ON HASH(b.c) = HASH(a.c); COUNT ------- 0 (1 row) dbadmin=> \x Expanded display is on. dbadmin=> SELECT HASH(12345::INT) "12345::INT", dbadmin-> HASH(12345::VARCHAR(10)) "12345::VARCHAR(10)", dbadmin-> HASH(12345::VARCHAR(100)) "12345::VARCHAR(100)", dbadmin-> HASH(12345::NUMERIC(100,0)) "12345::NUMERIC(100,0)", dbadmin-> HASH(12345::NUMERIC(100,5)) "12345::NUMERIC(100,5)", dbadmin-> HASH(12345::NUMERIC(100,10)) "12345::NUMERIC(100,10)", dbadmin-> HASH(12345::FLOAT) "12345::FLOAT"; -[ RECORD 1 ]----------+-------------------- 12345::INT | 4960063907862199528 12345::VARCHAR(10) | 7521734626139061666 12345::VARCHAR(100) | 7521734626139061666 12345::NUMERIC(100,0) | 3327062938535469805 12345::NUMERIC(100,5) | 6342074168025654472 12345::NUMERIC(100,10) | 560125968008200468 12345::FLOAT | 4397895841065323663 Helpful Link: https://www.vertica.com/docs/latest/HTML/Content/Authoring/SQLReferenceManual/Functions/Mathematical/HASH.htm

Have fun!