Complex Data Types for favroparser
You specify the data type of a record in the Avro file using the type
parameter for favroparser
. The favroparser
supports these complex data types:
This section describes attributes associated with the complex data types.
Records
Records have the following attributes:
Attribute | Description |
---|---|
name | A JSON string for the name of the record |
fields |
A JSON array used to list fields. Each field is a JSON object:
|
The name
of each field is used as a virtual column name. If flatten_records = true
and several nesting levels are present, Vertica concatenates the record names to create the key_name
, as follows:
{ "type": "record", "name": "Profile", "fields" : [ {"name": "UserName", "type": "string"}, {"name": "Address", "type": "string"} ] }
{ "type": "record", "name": "Profile", "fields" : [ {VerticaUser}, {VerticaUser Address} ] }
Vertica creates virtual columns for the records as follows:
Names | Values |
---|---|
UserName
|
VerticaUser
|
Address
|
VerticaUser Address
|
Enums
Enums (enumerated values) use the type name enum
and support the following attributes:
Attribute | Description |
---|---|
name | A JSON string for the name of the enum |
symbols | A JSON array used to list symbols as JSON strings. All symbols in an enum must be unique and duplicates are prohibited |
Example:
{ "type": "enum", "name": "suit", "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"] }
Consider the preceding Avro schema with a record that contains a field with the value HEARTS
. In this case, the key value pair copied into the __raw__
column has suit
as the key and HEARTS
as the value.
Arrays
Arrays use the type name array
and support one attribute:
Attribute | Description |
---|---|
items | The schema of the array's items |
For example, declare an array of strings:
{"type": "array", "items": "string"}
Similar to the capabilities for Records
, you can nest and flatten Arrays
using flatten_arrays=true
:
{ "__name__" : "Order", <-- artificial __name__ key for record "customer_id" : "111222", "order_details" : { <-- array of records "0" : { <-- array index 0 "__name__" : "OrderDetail", "product_detail" : {
"__name__" : "Product", "price" : "46.21", "product_category" : { <- array of strings "0" : "electronics", "1" : "printers", "2" : "computers" }, "product_name" : "mycompany printer 123abc", "product_status" : "ONLY_FEW_LEFT" } }, "order_id" : "2389646", "total" : "132.43" }
Here is the result of flattening the array:
{ "order_details.0.__name__" : "OrderDetail", "order_details.0.product_detail.0.product_category" : "electronics", "order_details.0.product_detail.1.product_category" : "prnters", "order_details.0.product_detail.2.product_category" : "computers", "order_details.0.product_detail.__name__" : "Product", "order_details.0.product_detail.price" : "46.21", "order_details.0.product_detail.product_name" : "mycompany printer 123abc", "order_details.0.product_detail.product_status" : "ONLY_FEW_LEFT", "__name__" : "Order", "customer_id" : "111222", "order_id" : "2389646", "total" : "132.43" }
Maps
Maps use the type name map
and support one attribute:
Attribute | Description |
---|---|
values | The schema of the map's items |
The favroparser
treats map keys as strings. For example, you can declare the map
type as a long as follows:
{"type": "map", "values": "long"}
Similar to Records
types, Maps
can also be nested and flattened using flatten_maps=true
.
The favroparser
inserts key-value pairs from the Avro map as key-value pairs in the __raw__
column. For an Avro record that has KeyX
with value 10
, and KeyY
with value 20
, favroparser
loads the key-value pairs as virtual columns KeyX
and KeyY
, with values 10
and 20
, respectively.
Unions
Vertica uses JSON arrays to represent Avro Unions
. Consider this example:
{"name":"TransactionID","type":["string","null"]}
The field TransactionID
can be a string
or null.
Fixed
Fixed (fixed
) Avro types support two attributes:
Attribute | Description |
---|---|
name | A string for the name of this data type |
size | An integer, specifying the number of bytes per value |
For example, you can declare a 16-byte quantity:
{"type": "fixed", "size": 16, "name": "md5"}
With the preceding declaration is the Avro file schema, consider a record that contains a field with the following byte values for the key md5
:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5]
The favroparser
loads the key value pair as an md5
key with the preceding byte values.