Loading...

verticapy.read_avro

verticapy.read_avro(path: str, schema: str | None = None, table_name: str | None = None, usecols: list | None = None, new_name: dict | None = None, insert: bool = False, reject_on_materialized_type_error: bool = False, flatten_maps: bool = True, flatten_arrays: bool = False, temporary_table: bool = False, temporary_local_table: bool = True, gen_tmp_table_name: bool = True, ingest_local: bool = True, genSQL: bool = False, materialize: bool = True, use_complex_dt: bool = False) vDataFrame

Ingests an AVRO file using flex tables.

Parameters

path: str

Absolute path where the AVRO file is located.

schema: str, optional

Schema where the AVRO file will be ingested.

table_name: str, optional

Final relation name.

usecols: list, optional

list of the AVRO parameters to ingest. The other ones will be ignored. If empty, all the AVRO parameters will be ingested.

new_name: dict, optional

Dictionary of the new column names. If the AVRO file is nested, it is recommended to change the final names because special characters will be included in the new column names. For example, {"param": {"age": 3, "name": Badr}, "date": 1993-03-11} will create 3 columns: “param.age”, “param.name” and “date”. You can rename these columns using the new_name parameter with the following dictionary: {"param.age": "age", "param.name": "name"}

insert: bool, optional

If set to True, the data will be ingested to the input relation. The AVRO parameters must be the same as the input relation otherwise they will not be ingested. If set to True, table_name cannot be empty.

reject_on_materialized_type_error: bool, optional

boolean, whether to reject a data row that contains a materialized column value that cannot be coerced into a compatible data type. If the value is False and the type cannot be coerced, the parser sets the value in that column to None. If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types.

flatten_maps: bool, optional

boolean, whether to flatten sub-maps within the AVRO data, separating map levels with a period (.). This value affects all data in the load, including nested maps.

flatten_arrays: bool, optional

boolean, whether to convert lists to sub-maps with integer keys. When lists are flattened, key names are concatenated in the same way as maps. lists are not flattened by default. This value affects all data in the load, including nested lists.

temporary_table: bool, optional

If set to True, a temporary table will be created.

temporary_local_table: bool, optional

If set to True, a temporary local table will be created. The parameter schema must be empty, otherwise this parameter is ignored.

gen_tmp_table_name: bool, optional

Sets the name of the temporary table. This parameter is only used when the parameter temporary_local_table is set to True and if the parameters table_name and schema are unspecified.

ingest_local: bool, optional

If set to True, the file will be ingested from the local machine.

genSQL: bool, optional

If set to True, the SQL code for creating the final table is generated but not executed. This is a good way to change the final relation types or to customize the data ingestion.

materialize: bool, optional

If set to True, the flex table is materialized into a table. Otherwise, it will remain a flex table. Flex tables simplify the data ingestion but have worse performace compared to regular tables.

use_complex_dt: bool, optional

boolean, whether the input data file has complex structure. If set to True, most of the other parameters are ignored.

Returns

vDataFrame

The vDataFrame of the relation.

Examples

In this example, we will first download an AVRO file and then ingest it into Vertica database.

We import verticapy:

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

Let’s download the AVRO file.

import requests

url = "https://github.com/vertica/VerticaPy/raw/master/verticapy/tests/utilities/variants.avro"

r = requests.get(url)

open('variants.avro', 'wb').write(r.content)
Out[5]: 1952604

Let’s ingest the AVRO file into the Vertica database.

from verticapy.core.parsers.avro import read_avro

read_avro(
    path = "variants.avro",
    table_name = "variants",
    schema = "public",
)
Abc
type
Varchar(20)
Abc
sv
Varchar(20)
Abc
Long varchar(162500)
Abc
strand
Varchar(20)
123
start
Integer
Abc
reference
Varchar(32)
Abc
names
Long varchar(40)
123
length
Integer
Abc
id
Varchar(22)
123
end
Integer
123
chromosome
Integer
Abc
Long varchar(1155)
Abc
Varchar(20)
123
annotation.start
Integer
Abc
annotation.reference
Varchar(32)
Abc
annotation.populationFrequencies
Long varchar(40)
Abc
annotation.minorAlleleFreq
Varchar(20)
Abc
annotation.minorAllele
Varchar(20)
Abc
annotation.id
Varchar(20)
Abc
annotation.hgvs
Varchar(20)
Abc
annotation.geneTraitAssociation
Long varchar(40)
Abc
annotation.geneExpression
Varchar(20)
Abc
annotation.geneDrugInteraction
Long varchar(40)
Abc
Long varchar(560)
Abc
Varchar(20)
Abc
Long varchar(800)
Abc
Long varchar(8090)
123
annotation.chromosome
Integer
Abc
annotation.ancestralAllele
Varchar(20)
Abc
annotation.alternate
Varchar(20)
Abc
annotation.additionalAttributes
Varchar(20)
Abc
annotation.__name__
Varchar(34)
Abc
alternate
Varchar(20)
Abc
__name__
Varchar(22)
1INDEL[null]+16050740A{}1rs587747231160507402216050740A{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
2INDEL[null]+16051723A{}1rs201906224160517232216051723A{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
3INDEL[null]+16052395AAAGCCAGAACCACTC{}16rs587774030160524102216052395AAAGCCAGAACCACTC{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
4INDEL[null]+16055850TT{}2rs587752360160558512216055850TT{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
5INDEL[null]+16055901T{}1rs587649799160559012216055901T{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
6INDEL[null]+16056484AG{}2rs587703083160564852216056484AG{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
7INDEL[null]+16057193A{}1rs587689210160571932216057193A{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
8INDEL[null]+16061832{}1rs587714792160618322216061832{}[null][null][null][null]{}[null]{}22[null]A[null]VariantAnnotationAVariantAvro
9INDEL[null]+16063429AG{}2rs587680732160634302216063429AG{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
10INDEL[null]+16063482AA{}2rs587700504160634832216063482AA{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
11INDEL[null]+16066472C{}1rs587669040160664722216066472C{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
12INDEL[null]+16070324T{}1rs587727612160703242216070324T{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
13INDEL[null]+16080425TA{}2rs543349252160804262216080425TA{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
14INDEL[null]+16085566{}4rs561100161160855692216085566{}[null][null][null][null]{}[null]{}22[null]TTTC[null]VariantAnnotationTTTCVariantAvro
15INDEL[null]+16140743TATC{}4rs577706315161407462216140743TATC{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
16INDEL[null]+16141583{}1rs545132695161415832216141583{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
17INDEL[null]+16142235CT{}2rs554362668161422362216142235CT{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
18INDEL[null]+16145459T{}1rs201309305161454592216145459T{}[null][null][null][null]{}[null]{}22[null][null]VariantAnnotationVariantAvro
19SNP[null]+16050075A{}1rs587697622160500752216050075A{}[null][null][null][null]{}[null]{}22[null]G[null]VariantAnnotationGVariantAvro
20SNP[null]+16050115G{}1rs587755077160501152216050115G{}[null][null][null][null]{}[null]{}22[null]A[null]VariantAnnotationAVariantAvro
21SNP[null]+16050213C{}1rs587654921160502132216050213C{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
22SNP[null]+16050319C{}1rs587712275160503192216050319C{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
23SNP[null]+16050527C{}1rs587769434160505272216050527C{}[null][null][null][null]{}[null]{}22[null]A[null]VariantAnnotationAVariantAvro
24SNP[null]+16050568C{}1rs587638893160505682216050568C{}[null][null][null][null]{}[null]{}22[null]A[null]VariantAnnotationAVariantAvro
25SNP[null]+16050607G{}1rs587720402160506072216050607G{}[null][null][null][null]{}[null]{}22[null]A[null]VariantAnnotationAVariantAvro
26SNP[null]+16050627G{}1rs587593704160506272216050627G{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
27SNP[null]+16050646G{}1rs587670191160506462216050646G{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
28SNP[null]+16050655G{}1rs587703534160506552216050655G{}[null][null][null][null]{}[null]{}22[null]A[null]VariantAnnotationAVariantAvro
29SNP[null]+16050678C{}1rs139377059160506782216050678C{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
30SNP[null]+16050679G{}1rs587682556160506792216050679G{}[null][null][null][null]{}[null]{}22[null]A[null]VariantAnnotationAVariantAvro
31SNP[null]+16050688C{}1rs587756191160506882216050688C{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
32SNP[null]+16050732C{}1rs587652033160507322216050732C{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
33SNP[null]+16050758T{}1rs587684957160507582216050758T{}[null][null][null][null]{}[null]{}22[null]C[null]VariantAnnotationCVariantAvro
34SNP[null]+16050783A{}1rs587743568160507832216050783A{}[null][null][null][null]{}[null]{}22[null]G[null]VariantAnnotationGVariantAvro
35SNP[null]+16050840C{}1rs587616822160508402216050840C{}[null][null][null][null]{}[null]{}22[null]G[null]VariantAnnotationGVariantAvro
36SNP[null]+16050847T{}1rs587702478160508472216050847T{}[null][null][null][null]{}[null]{}22[null]C[null]VariantAnnotationCVariantAvro
37SNP[null]+16050856G{}1rs587754502160508562216050856G{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
38SNP[null]+16050874G{}1rs587634452160508742216050874G{}[null][null][null][null]{}[null]{}22[null]T[null]VariantAnnotationTVariantAvro
39SNP[null]+16050922T{}1rs367963583160509222216050922T{}[null][null][null][null]{}[null]{}22[null]G[null]VariantAnnotationGVariantAvro
40SNP[null]+16050954G{}1rs587763973160509542216050954G{}[null][null][null][null]{}[null]{}22[null]A[null]VariantAnnotationAVariantAvro
41SNP[null]