
verticapy.read_avro¶
- verticapy.read_avro(path: str, schema: str | None = None, table_name: str | None = None, usecols: list | None = None, new_name: dict | None = None, insert: bool = False, reject_on_materialized_type_error: bool = False, flatten_maps: bool = True, flatten_arrays: bool = False, temporary_table: bool = False, temporary_local_table: bool = True, gen_tmp_table_name: bool = True, ingest_local: bool = True, genSQL: bool = False, materialize: bool = True, use_complex_dt: bool = False) vDataFrame ¶
Ingests an AVRO file using flex tables.
Parameters¶
- path: str
Absolute path where the AVRO file is located.
- schema: str, optional
Schema where the AVRO file will be ingested.
- table_name: str, optional
Final relation name.
- usecols: list, optional
list
of the AVRO parameters to ingest. The other ones will be ignored. If empty, all the AVRO parameters will be ingested.- new_name: dict, optional
Dictionary of the new column names. If the AVRO file is nested, it is recommended to change the final names because special characters will be included in the new column names. For example,
{"param": {"age": 3, "name": Badr}, "date": 1993-03-11}
will create 3 columns: “param.age”, “param.name” and “date”. You can rename these columns using thenew_name
parameter with the followingdictionary
:{"param.age": "age", "param.name": "name"}
- insert: bool, optional
If set to
True
, the data will be ingested to the input relation. The AVRO parameters must be the same as the input relation otherwise they will not be ingested. If set toTrue
,table_name
cannot be empty.- reject_on_materialized_type_error: bool, optional
boolean
, whether to reject a data row that contains a materialized column value that cannot be coerced into a compatible data type. If the value isFalse
and the type cannot be coerced, the parser sets the value in that column toNone
. If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types.- flatten_maps: bool, optional
boolean
, whether to flatten sub-maps within the AVRO data, separating map levels with a period (.). This value affects all data in the load, including nested maps.- flatten_arrays: bool, optional
boolean
, whether to convert lists to sub-maps withinteger
keys. When lists are flattened, key names are concatenated in the same way as maps.lists
are not flattened by default. This value affects all data in the load, including nestedlists
.- temporary_table: bool, optional
If set to
True
, a temporary table will be created.- temporary_local_table: bool, optional
If set to
True
, a temporary local table will be created. The parameterschema
must be empty, otherwise this parameter is ignored.- gen_tmp_table_name: bool, optional
Sets the name of the temporary table. This parameter is only used when the parameter
temporary_local_table
is set toTrue
and if the parameterstable_name
andschema
are unspecified.- ingest_local: bool, optional
If set to
True
, the file will be ingested from the local machine.- genSQL: bool, optional
If set to
True
, the SQL code for creating the final table is generated but not executed. This is a good way to change the final relation types or to customize the data ingestion.- materialize: bool, optional
If set to
True
, the flex table is materialized into a table. Otherwise, it will remain a flex table. Flex tables simplify the data ingestion but have worse performace compared to regular tables.- use_complex_dt: bool, optional
boolean
, whether the input data file has complex structure. If set toTrue
, most of the other parameters are ignored.
Returns¶
- vDataFrame
The
vDataFrame
of the relation.
Examples¶
In this example, we will first download an AVRO file and then ingest it into Vertica database.
We import
verticapy
:import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.Let’s download the AVRO file.
import requests url = "https://github.com/vertica/VerticaPy/raw/master/verticapy/tests/utilities/variants.avro" r = requests.get(url) open('variants.avro', 'wb').write(r.content) Out[5]: 1952604
Let’s ingest the AVRO file into the Vertica database.
from verticapy.core.parsers.avro import read_avro read_avro( path = "variants.avro", table_name = "variants", schema = "public", )
AbctypeAbcsvAbcAbcstrand123startAbcreferenceAbcnames123lengthAbcid123end123chromosomeAbcAbc123annotation.startAbcannotation.referenceAbcannotation.populationFrequenciesAbcannotation.minorAlleleFreqAbcannotation.minorAlleleAbcannotation.idAbcannotation.hgvsAbcannotation.geneTraitAssociationAbcannotation.geneExpressionAbcannotation.geneDrugInteractionAbcAbcAbcAbc123annotation.chromosomeAbcannotation.ancestralAlleleAbcannotation.alternateAbcannotation.additionalAttributesAbcannotation.__name__AbcalternateAbc__name__1 INDEL [null] + 16050740 A {} 1 rs587747231 16050740 22 16050740 A {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 2 INDEL [null] + 16051723 A {} 1 rs201906224 16051723 22 16051723 A {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 3 INDEL [null] + 16052395 AAAGCCAGAACCACTC {} 16 rs587774030 16052410 22 16052395 AAAGCCAGAACCACTC {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 4 INDEL [null] + 16055850 TT {} 2 rs587752360 16055851 22 16055850 TT {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 5 INDEL [null] + 16055901 T {} 1 rs587649799 16055901 22 16055901 T {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 6 INDEL [null] + 16056484 AG {} 2 rs587703083 16056485 22 16056484 AG {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 7 INDEL [null] + 16057193 A {} 1 rs587689210 16057193 22 16057193 A {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 8 INDEL [null] + 16061832 {} 1 rs587714792 16061832 22 16061832 {} [null] [null] [null] [null] {} [null] {} 22 [null] A [null] VariantAnnotation A VariantAvro 9 INDEL [null] + 16063429 AG {} 2 rs587680732 16063430 22 16063429 AG {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 10 INDEL [null] + 16063482 AA {} 2 rs587700504 16063483 22 16063482 AA {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 11 INDEL [null] + 16066472 C {} 1 rs587669040 16066472 22 16066472 C {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 12 INDEL [null] + 16070324 T {} 1 rs587727612 16070324 22 16070324 T {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 13 INDEL [null] + 16080425 TA {} 2 rs543349252 16080426 22 16080425 TA {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 14 INDEL [null] + 16085566 {} 4 rs561100161 16085569 22 16085566 {} [null] [null] [null] [null] {} [null] {} 22 [null] TTTC [null] VariantAnnotation TTTC VariantAvro 15 INDEL [null] + 16140743 TATC {} 4 rs577706315 16140746 22 16140743 TATC {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 16 INDEL [null] + 16141583 {} 1 rs545132695 16141583 22 16141583 {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 17 INDEL [null] + 16142235 CT {} 2 rs554362668 16142236 22 16142235 CT {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 18 INDEL [null] + 16145459 T {} 1 rs201309305 16145459 22 16145459 T {} [null] [null] [null] [null] {} [null] {} 22 [null] [null] VariantAnnotation VariantAvro 19 SNP [null] + 16050075 A {} 1 rs587697622 16050075 22 16050075 A {} [null] [null] [null] [null] {} [null] {} 22 [null] G [null] VariantAnnotation G VariantAvro 20 SNP [null] + 16050115 G {} 1 rs587755077 16050115 22 16050115 G {} [null] [null] [null] [null] {} [null] {} 22 [null] A [null] VariantAnnotation A VariantAvro 21 SNP [null] + 16050213 C {} 1 rs587654921 16050213 22 16050213 C {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 22 SNP [null] + 16050319 C {} 1 rs587712275 16050319 22 16050319 C {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 23 SNP [null] + 16050527 C {} 1 rs587769434 16050527 22 16050527 C {} [null] [null] [null] [null] {} [null] {} 22 [null] A [null] VariantAnnotation A VariantAvro 24 SNP [null] + 16050568 C {} 1 rs587638893 16050568 22 16050568 C {} [null] [null] [null] [null] {} [null] {} 22 [null] A [null] VariantAnnotation A VariantAvro 25 SNP [null] + 16050607 G {} 1 rs587720402 16050607 22 16050607 G {} [null] [null] [null] [null] {} [null] {} 22 [null] A [null] VariantAnnotation A VariantAvro 26 SNP [null] + 16050627 G {} 1 rs587593704 16050627 22 16050627 G {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 27 SNP [null] + 16050646 G {} 1 rs587670191 16050646 22 16050646 G {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 28 SNP [null] + 16050655 G {} 1 rs587703534 16050655 22 16050655 G {} [null] [null] [null] [null] {} [null] {} 22 [null] A [null] VariantAnnotation A VariantAvro 29 SNP [null] + 16050678 C {} 1 rs139377059 16050678 22 16050678 C {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 30 SNP [null] + 16050679 G {} 1 rs587682556 16050679 22 16050679 G {} [null] [null] [null] [null] {} [null] {} 22 [null] A [null] VariantAnnotation A VariantAvro 31 SNP [null] + 16050688 C {} 1 rs587756191 16050688 22 16050688 C {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 32 SNP [null] + 16050732 C {} 1 rs587652033 16050732 22 16050732 C {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 33 SNP [null] + 16050758 T {} 1 rs587684957 16050758 22 16050758 T {} [null] [null] [null] [null] {} [null] {} 22 [null] C [null] VariantAnnotation C VariantAvro 34 SNP [null] + 16050783 A {} 1 rs587743568 16050783 22 16050783 A {} [null] [null] [null] [null] {} [null] {} 22 [null] G [null] VariantAnnotation G VariantAvro 35 SNP [null] + 16050840 C {} 1 rs587616822 16050840 22 16050840 C {} [null] [null] [null] [null] {} [null] {} 22 [null] G [null] VariantAnnotation G VariantAvro 36 SNP [null] + 16050847 T {} 1 rs587702478 16050847 22 16050847 T {} [null] [null] [null] [null] {} [null] {} 22 [null] C [null] VariantAnnotation C VariantAvro 37 SNP [null] + 16050856 G {} 1 rs587754502 16050856 22 16050856 G {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 38 SNP [null] + 16050874 G {} 1 rs587634452 16050874 22 16050874 G {} [null] [null] [null] [null] {} [null] {} 22 [null] T [null] VariantAnnotation T VariantAvro 39 SNP [null] + 16050922 T {} 1 rs367963583 16050922 22 16050922 T {} [null] [null] [null] [null] {} [null] {} 22 [null] G [null] VariantAnnotation G VariantAvro 40 SNP [null] + 16050954 G {} 1 rs587763973 16050954 22 16050954 G {} [null] [null] [null] [null] {} [null] {} 22 [null] A [null] VariantAnnotation A VariantAvro 41 SNP [null]