Data Type Conversions from Hive to Vertica

The data types recognized by Hive differ from the data types recognized by Vertica. The following table lists how the HCatalog Connector converts Hive data types into data types compatible with Vertica.

Hive Data Type Vertica Data Type
TINYINT (1-byte) TINYINT (8-bytes)
SMALLINT (2-bytes) SMALLINT (8-bytes)
INT (4-bytes) INT (8-bytes)
BIGINT (8-bytes) BIGINT (8-bytes)
BOOLEAN BOOLEAN
FLOAT (4-bytes) FLOAT (8-bytes)
DECIMAL (precision, scale) DECIMAL (precision, scale)
DOUBLE (8-bytes) DOUBLE PRECISION (8-bytes)
CHAR (length in characters) CHAR (length in bytes)
VARCHAR (length in characters) VARCHAR (length in bytes), if length <= 65000
LONG VARCHAR (length in bytes), if length > 65000
STRING (2 GB max) VARCHAR (65000)
BINARY (2 GB max) VARBINARY (65000)
DATE DATE
TIMESTAMP TIMESTAMP
LIST/ARRAY VARCHAR (65000) containing a JSON-format representation of the list.
MAP VARCHAR (65000) containing a JSON-format representation of the map.
STRUCT VARCHAR (65000) containing a JSON-format representation of the struct.

Data-Width Handling Differences Between Hive and Vertica

The HCatalog Connector relies on Hive SerDe classes to extract data from files on HDFS. Therefore, the data read from these files are subject to Hive's data width restrictions. For example, suppose the SerDe parses a value for an INT column into a value that is greater than 232-1 (the maximum value for a 32-bit integer). In this case, the value is rejected even if it would fit into a Vertica's 64-bit INTEGER column because it cannot fit into Hive's 32-bit INT.

Hive measures CHAR and VARCHAR length in characters and Vertica measures them in bytes. Therefore, if multi-byte encodings are being used (like Unicode), text might be truncated in Vertica.

Once the value has been parsed and converted to a Vertica data type, it is treated as native data. This treatment can result in some confusion when comparing the results of an identical query run in Hive and in Vertica. For example, if your query adds two INT values that result in a value that is larger than 232-1, the value overflows its 32-bit INT data type, causing Hive to return an error. When running the same query with the same data in Vertica using the HCatalog Connector, the value will probably still fit within Vertica's 64-int value. Thus the addition is successful and returns a value.