Complex Data Types for favroparser

You specify the data type of a record in the Avro file using the type parameter for favroparser. The favroparser supports these complex data types:

This section describes attributes associated with the complex data types.

Records

Records have the following attributes:

Attribute Description
name A JSON string for the name of the record
fields

A JSON array used to list fields. Each field is a JSON object:

  • name: A JSON string for the name of the field
  • type: A JSON object used to define a schema or a JSON string used for naming a record definition

The name of each field is used as a virtual column name. If flatten_records = true and several nesting levels are present, Vertica concatenates the record names to create the key_name, as follows: 

{
  "type": "record",
  "name": "Profile",
  "fields" : [
      {"name": "UserName", "type": "string"},
      {"name": "Address", "type": "string"}
   ]
}
{
  "type": "record",
  "name": "Profile",
  "fields" : [
      {VerticaUser},
      {VerticaUser Address}
]
}

Vertica creates virtual columns for the records as follows:

Names Values
UserName VerticaUser
Address VerticaUser Address

Enums

Enums (enumerated values) use the type name enum and support the following attributes:

Attribute Description
name A JSON string for the name of the enum
symbols A JSON array used to list symbols as JSON strings. All symbols in an enum must be unique and duplicates are prohibited

Example:

{   
	"type": "enum",
	"name": "suit",
	"symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
}

Consider the preceding Avro schema with a record that contains a field with the value HEARTS. In this case, the key value pair copied into the __raw__ column has suit as the key and HEARTS as the value.

Arrays

Arrays use the type name array and support one attribute:

Attribute Description
items The schema of the array's items

For example, declare an array of strings:

{"type": "array", "items": "string"}

Similar to the capabilities for Records, you can nest and flatten Arrays using flatten_arrays=true :

{
	"__name__" : "Order",                        <-- artificial __name__ key for record
	"customer_id" : "111222",
	"order_details" : {                                <-- array of records
	"0" : {                                               <-- array index 0
		"__name__" : "OrderDetail",
		"product_detail" : {
			"__name__" : "Product",
			 "price" : "46.21",
			 "product_category" : {           <- array of strings
				"0" : "electronics",
				"1" : "printers",
				"2" : "computers"
				},
		"product_name" : "mycompany printer 123abc",
		"product_status" : "ONLY_FEW_LEFT"
		}
	},
	"order_id" : "2389646",
	"total" : "132.43"
}		

Here is the result of flattening the array:

{
	"order_details.0.__name__" : "OrderDetail",
	"order_details.0.product_detail.0.product_category" : "electronics",
	"order_details.0.product_detail.1.product_category" : "prnters",
	"order_details.0.product_detail.2.product_category" : "computers",
	"order_details.0.product_detail.__name__" : "Product",
	"order_details.0.product_detail.price" : "46.21",
	"order_details.0.product_detail.product_name" : "mycompany printer 123abc",
	"order_details.0.product_detail.product_status" : "ONLY_FEW_LEFT",
	"__name__" : "Order",
	"customer_id" : "111222",
	"order_id" : "2389646",
	"total" : "132.43"
}

Maps

Maps use the type name map and support one attribute:

Attribute Description
values The schema of the map's items

The favroparser treats map keys as strings. For example, you can declare the map type as a long as follows: 

{"type": "map", "values": "long"}

Similar to Records types, Maps can also be nested and flattened using flatten_maps=true .

The favroparser inserts key-value pairs from the Avro map as key-value pairs in the __raw__ column. For an Avro record that has KeyX with value 10, and KeyY with value 20, favroparser loads the key-value pairs as virtual columns KeyX and KeyY, with values 10 and 20, respectively.

Unions

Vertica uses JSON arrays to represent Avro Unions. Consider this example:

{"name":"TransactionID","type":["string","null"]}

The field TransactionID can be a string or null.

Fixed

Fixed (fixed) Avro types support two attributes:

Attribute Description
name A string for the name of this data type
size An integer, specifying the number of bytes per value

For example, you can declare a 16-byte quantity:

{"type": "fixed", "size": 16, "name": "md5"}

With the preceding declaration is the Avro file schema, consider a record that contains a field with the following byte values for the key md5:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5]

The favroparser loads the key value pair as an md5 key with the preceding byte values.