Logical Data Types for favroparser

The favroparser supports the following logical data types, indicated in the Avro file with the "logicalType" field:

Loading Avro logical types into regular tables requires that the target column use Vertica data types that support the logical type. When you attempt to load data using an invalid logical type, the logical type is ignored and the underlying Avro type is used.

Logical Types and Unions

Vertica supports the Avro union complex type, which is used for fields that accept more than one type. Vertica recommends using a union with Avro logical types.

To implement a union with a logical type, the second element of the union array must be a JSON object consisting of a supported Avro type annotated with a logical type.

For example, the following schema uses a union to define the dob field that can either be a null value by default, or a timestamp logical type:

{
    "type": "record",
    "name": "User",
    "fields": [{
        "name": "dob", 
        "type": [ "null", {
            "type": "long",
            "logicalType": "timestamp-micros"
        }]
    }]
}

Decimal

The decimal logical type annotates the Avro bytes or fixed types. Load decimal logical types into target columns that use the following Vertica data types:

The following table describes how Vertica implements the precision and scale attributes:

Attribute Required Valid Setting Description
precision Yes 0 < precision ≤ 1024

Vertica rejects the value if:

  • The schema precision setting is greater than the precision setting for the target column.
  • For fixed types, the precision value is greater than what is allowed by the size attribute.

If the data type for the target column uses the default precision setting, the precision setting in the Avro schema overrides the default.

scale No 0 ≤ scaleprecision If omitted, the default setting is 0.

The following example uses the decimal logical type to track balance rounding on payment accounts:

{
    "type":"record",
    "name":"payments",
    "fields":[{
        "name":"id",
        "type":"int"
    }, {
        "name":"rounding", 
        "type": [ "null", {
            "type":"fixed",
            "size":4,
            "logicalType":"decimal",
            "precision":8,
            "scale":2,
            "name":"fixed_cost"
        }]
    }]
}

The following query retrieves rows containing positive and negative decimal values:

SELECT * FROM payments;
 id | rounding 
----+----------
  8 |     0.00
  9 |     0.01
 10 |    -0.01
(3 rows)

Date

The date logical type annotates the Avro integer primitive type. The integer type stores the number of days since midnight January 1, 1970 UTC.

Load values using the date logical type into target columns using the following Vertica data types:

The following schema uses the date logical type to represent a sales transaction:

{
    "type": "record",
    "name": "date",
    "namespace": "sales",
    "fields":[ 
        { "name": "orderkey", "type": "string" }, 
        { "name": "custkey", "type": "string" },
        { "name": "date", "type": [ "null", {
             "type": "int", 
             "logicalType": "date"
        }]
    }]
}

Time

Vertica supports the Avro time logical type in microsecond and millisecond precisions. Both types store the elapsed time since midnight.

time-micro annotates the Avro long primitive type. time-millis annotates the Avro int primitive type. Load values using the time logical type into target columns that use the following Vertica data types:

The time logical type does not provide a time zone value. For target columns that use the TIMETZ data type, Vertica uses UTC as the default.

For target columns using TIME/TIMETZ, values that are not within the valid limits are loaded as null values.

The following schema uses microsecond and millisecond logical types to represent user login times:

{
    "type": "record",
    "name": "logins",
    "namespace": "logs",
    "fields": [{
        "name": "id",
        "type": [ "null", "long" ]
    }, {
        "name": "login_time",
        "type": [ "null", {
            "type": "long",
            "logicalType": "time-micros"
        }]
    }, {
        "name": "login_timezone",
        "type": [ "null", {
            "type": "int",
            "logicalType": "time-millis"
        }]
    }]
}

When values are loaded into a flex table, the default UTC time zone is stored with the data:

MapToString                                                
--------------
{
    "__name__": "logins",
    "id": 1001,
    "login_time": "14:52:45.368449 UTC",
    "login_timezone": "14:52:45.368 UTC"
}

The following query retrieves a row from the logins table:

=> SELECT * FROM logins;
       id        |   login_time    |   login_timezone
-----------------+-----------------+--------------------
      1001       | 14:52:45.368449 |  14:52:45.368+00

Timestamp

Vertica supports the Avro timestamp logical type in microsecond and millisecond precisions. Both types store the elapsed time since midnight.

timestamp-micros and timestamp-millis logical types both annotate the Avro long primitive type. Load values using the timestamp logical type into target columns that use the following Vertica data types:

The following schema uses microsecond and millisecond timestamps to represent the instant that a file is modified:

{
    "type": "record",
    "name": "file_mods",
    "namespace": "logs",
    "fields": [{
        "name": "id",
        "type": [ "null", "long" ]
    }, {
        "name": "stamp",
        "type": [ "null", {
            "type": "long",
            "logicalType": "timestamp-micros"
        }]
    }, {
        "name": "stamp_tz",
        "type": [ "null", {
            "type": "long",
            "logicalType": "timestamp-millis"
        }]
    }]
}

When values are loaded into a flex table, the default UTC time zone is stored with the data:

MapToString                                                
--------------
{
    "__name__": "file_mods",
    "id": 1001,
    "stamp": "2020-08-10 14:52:45.368449 UTC",
    "stamp_tz": "2020-08-10 14:52:45.368 UTC"
}

When values are loaded into a regular table, the timezone is included only in the value annotated with timestamp-millis, and it is represented as an offset to UTC. Additionally, the millisecond values are right-extended with padded 0s:

=> SELECT * FROM file_mods;
       id      |           stamp           |          stamp_tz
---------------+---------------------------+-------------------------------
      1001     |2020-08-10 14:52:45.368449 | 2020-08-10 10:52:45.368000-04

Duration

The Avro duration logical type annotates the Avro fixed type. Load values using the duration logical types into target columns that use the following Vertica data types:

The following schema uses the duration logical type to represent when a person joined an organization:

{
    "type": "record",
    "name": "membership",
    "namespace": "logs",
    "fields": [{
        "name": "id",
        "type": "int"
    }, {
        "name": "membership_length", 
        "type": [ "null", {
            "type": "fixed",
            "size": 12,
            "logicalType": "duration", 
            "name": "fixed_membership"
        }]
    }]
}

A membership record is loaded into a flex table in the following format. The time uses millisecond precision:

MapToString                                                
--------------
{
    "__name__": "membership",
    "id": 1001,
    "membership_length": "2 months 15 days 4.112 secs"
}

The membership_length column in the following query uses DAY to MINUTE intervals:

=> SELECT * FROM membership;
      id       |      membership_length  
---------------+-----------------------------
     1001      |     75 days 4.112 secs