read_avro¶
In [ ]:
read_avro(path: str,
schema: str = "",
table_name: str = "",
usecols: list = [],
new_name: dict = {},
insert: bool = False,
reject_on_materialized_type_error: bool = False,
flatten_maps: bool = True,
flatten_arrays: bool = False,
temporary_table: bool = False,
temporary_local_table: bool = True,
gen_tmp_table_name: bool = True,
ingest_local: bool = True,
genSQL: bool = False,
materialize: bool = True,
use_complex_dt: bool = False,)
Ingests an AVRO file.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
path | str | ❌ | Absolute path where the AVRO file is located. |
schema | str | ✓ | Schema where the AVRO file will be ingested. |
table_name | str | ✓ | Final relation name. |
usecols | list | ✓ | List of the AVRO parameters to ingest. The other ones will be ignored. If empty all the AVRO parameters will be ingested. |
new_name | dict | ✓ | Dictionary of the new columns name. If the AVRO file is nested, it is advised to change the final names as special characters will be included. For example, {"param": {"age": 3, "name": Badr}, "date": 1993-03-11} will create 3 columns: "param.age", "param.name" and "date". You can rename these columns using the 'new_name' parameter with the following dictionary: {"param.age": "age", "param.name": "name"} |
insert | bool | ✓ | If set to True, the data will be ingested to the input relation. The AVROJSAVROON parameters must be the same than the input relation otherwise they will not be ingested. |
reject_on_materialized_type_error | bool | ✓ | Boolean, whether to reject a data row that contains a materialized column value that cannot be coerced into a compatible data type. If the value is false and the type cannot be coerced, the parser sets the value in that column to null. If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types. |
flatten_maps | bool | ✓ | Boolean, whether to flatten all Avro maps. Key names are concatenated with nested levels. This value is recursive and affects all data in the load. |
flatten_arrays | bool | ✓ | Boolean, whether to convert lists to sub-maps with integer keys. When lists are flattened, key names are concatenated as for maps. Lists are not flattened by default. This value affects all data in the load, including nested lists. |
temporary_table | bool | ✓ | If set to True, a temporary table will be created. |
temporary_local_table | bool | ✓ | If set to True, a temporary local table will be created and the parameter 'schema' is ignored. |
gen_tmp_table_name | bool | ✓ | Sets the name of the temporary table. This parameter is only used when the parameter 'temporary_local_table' is set to True and if the parameters "table_name" and "schema" are unspecified. |
ingest_local | bool | ✓ | If set to True, the file will be ingested from the local machine. |
ingest_local | bool | ✓ | If set to True, the file will be ingested from the local machine. |
genSQL | bool | ✓ | If set to True, the SQL code for creating the final table is generated but not executed. This is a good way to change the final relation types or to customize the data ingestion. |
materialize | bool | ✓ | If set to True, the flex table is materialized into a table. Otherwise, it will remain a flex table. Flex tables simplify the data ingestion but have worse performace compared to regular tables. |
use_complex_dt | bool | ✓ | Boolean, whether the input data file has complex structure. When this is true, most of the other parameters will be ignored. |
Returns¶
vDataFrame : The vDataFrame of the relation.
Example¶
In [9]:
from verticapy.utilities import *
# Ingesting the AVRO file
read_avro("titanic.avro",
table_name = "titanic_dataset",
schema = "public")
Out[9]:
In [11]:
# Ingesting the AVRO file and renaming some columns
read_avro("titanic.avro",
table_name = "titanic_dataset",
schema = "public",
new_name = {"fields.fare": "fare",
"fields.sex": "sex"})
Out[11]:
In [14]:
# Ingesting the AVRO file, using some columns and
# renaming some columns
read_avro("titanic.avro",
table_name = "titanic_dataset",
schema = "public",
usecols = ["fields.fare", "fields.sex"],
new_name = {"fields.fare": "fare",
"fields.sex": "sex"})