Logical Data Types for favroparser
The favroparser supports the following logical data types, indicated in the Avro file with the "logicalType" field:
Loading Avro logical types into regular tables requires that the target column use Vertica data types that support the logical type. When you attempt to load data using an invalid logical type, the logical type is ignored and the underlying Avro type is used.
Logical Types and Unions
Vertica supports the Avro union complex type, which is used for fields that accept more than one type. Vertica recommends using a union with Avro logical types.
To implement a union with a logical type, the second element of the union array must be a JSON object consisting of a supported Avro type annotated with a logical type.
For example, the following schema uses a union to define the dob
field that can either be a null value by default, or a timestamp logical type:
{ "type": "record", "name": "User", "fields": [{ "name": "dob", "type": [ "null", { "type": "long", "logicalType": "timestamp-micros" }] }] }
Decimal
The decimal logical type annotates the Avro bytes or fixed types. Load decimal logical types into target columns that use the following Vertica data types:
The following table describes how Vertica implements the precision and scale attributes:
Attribute | Required | Valid Setting | Description |
---|---|---|---|
precision | Yes | 0 < precision ≤ 1024 |
Vertica rejects the value if:
If the data type for the target column uses the default precision setting, the precision setting in the Avro schema overrides the default. |
scale | No | 0 ≤ scale ≤ precision | If omitted, the default setting is 0. |
The following example uses the decimal logical type to track balance rounding on payment accounts:
{ "type":"record", "name":"payments", "fields":[{ "name":"id", "type":"int" }, { "name":"rounding", "type": [ "null", { "type":"fixed", "size":4, "logicalType":"decimal", "precision":8, "scale":2, "name":"fixed_cost" }] }] }
The following query retrieves rows containing positive and negative decimal values:
SELECT * FROM payments; id | rounding ----+---------- 8 | 0.00 9 | 0.01 10 | -0.01 (3 rows)
Date
The date logical type annotates the Avro integer primitive type. The integer type stores the number of days since midnight January 1, 1970 UTC.
Load values using the date logical type into target columns using the following Vertica data types:
The following schema uses the date logical type to represent a sales transaction:
{ "type": "record", "name": "date", "namespace": "sales", "fields":[ { "name": "orderkey", "type": "string" }, { "name": "custkey", "type": "string" }, { "name": "date", "type": [ "null", { "type": "int", "logicalType": "date" }] }] }
Time
Vertica supports the Avro time logical type in microsecond and millisecond precisions. Both types store the elapsed time since midnight.
time-micro
annotates the Avro long primitive type. time-millis
annotates the Avro int primitive type. Load values using the time logical type into target columns that use the following Vertica data types:
The time logical type does not provide a time zone value. For target columns that use the TIMETZ data type, Vertica uses UTC as the default.
For target columns using TIME/TIMETZ, values that are not within the valid limits are loaded as null values.
The following schema uses microsecond and millisecond logical types to represent user login times:
{ "type": "record", "name": "logins", "namespace": "logs", "fields": [{ "name": "id", "type": [ "null", "long" ] }, { "name": "login_time", "type": [ "null", { "type": "long", "logicalType": "time-micros" }] }, { "name": "login_timezone", "type": [ "null", { "type": "int", "logicalType": "time-millis" }] }] }
When values are loaded into a flex table, the default UTC time zone is stored with the data:
MapToString -------------- { "__name__": "logins", "id": 1001, "login_time": "14:52:45.368449 UTC", "login_timezone": "14:52:45.368 UTC" }
The following query retrieves a row from the logins table:
=> SELECT * FROM logins; id | login_time | login_timezone -----------------+-----------------+-------------------- 1001 | 14:52:45.368449 | 14:52:45.368+00
Timestamp
Vertica supports the Avro timestamp logical type in microsecond and millisecond precisions. Both types store the elapsed time since midnight.
timestamp-micros
and timestamp-millis
logical types both annotate the Avro long primitive type. Load values using the timestamp logical type into target columns that use the following Vertica data types:
The following schema uses microsecond and millisecond timestamps to represent the instant that a file is modified:
{ "type": "record", "name": "file_mods", "namespace": "logs", "fields": [{ "name": "id", "type": [ "null", "long" ] }, { "name": "stamp", "type": [ "null", { "type": "long", "logicalType": "timestamp-micros" }] }, { "name": "stamp_tz", "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" }] }] }
When values are loaded into a flex table, the default UTC time zone is stored with the data:
MapToString -------------- { "__name__": "file_mods", "id": 1001, "stamp": "2020-08-10 14:52:45.368449 UTC", "stamp_tz": "2020-08-10 14:52:45.368 UTC" }
When values are loaded into a regular table, the timezone is included only in the value annotated with timestamp-millis
, and it is represented as an offset to UTC. Additionally, the millisecond values are right-extended with padded 0s:
=> SELECT * FROM file_mods; id | stamp | stamp_tz ---------------+---------------------------+------------------------------- 1001 |2020-08-10 14:52:45.368449 | 2020-08-10 10:52:45.368000-04
Duration
The Avro duration logical type annotates the Avro fixed type. Load values using the duration logical types into target columns that use the following Vertica data types:
The following schema uses the duration logical type to represent when a person joined an organization:
{ "type": "record", "name": "membership", "namespace": "logs", "fields": [{ "name": "id", "type": "int" }, { "name": "membership_length", "type": [ "null", { "type": "fixed", "size": 12, "logicalType": "duration", "name": "fixed_membership" }] }] }
A membership record is loaded into a flex table in the following format. The time uses millisecond precision:
MapToString -------------- { "__name__": "membership", "id": 1001, "membership_length": "2 months 15 days 4.112 secs" }
The membership_length
column in the following query uses DAY
to MINUTE
intervals:
=> SELECT * FROM membership; id | membership_length ---------------+----------------------------- 1001 | 75 days 4.112 secs