read_json¶
In [ ]:
read_json(path: str,
schema: str = "",
table_name: str = "",
usecols: list = [],
new_name: dict = {},
insert: bool = False,
start_point: str = None,
record_terminator: str = None,
suppress_nonalphanumeric_key_chars: bool = False,
reject_on_materialized_type_error: bool = False,
reject_on_duplicate: bool = False,
reject_on_empty_key: bool = False,
flatten_maps: bool = True,
flatten_arrays: bool = False,
temporary_table: bool = False,
temporary_local_table: bool = True,
gen_tmp_table_name: bool = True,
ingest_local: bool = True,
genSQL: bool = False,
materialize: bool = True,
use_complex_dt: bool = False,)
Ingests a JSON file.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
path | str | ❌ | Absolute path where the JSON file is located. |
schema | str | ✓ | Schema where the JSON file will be ingested. |
table_name | str | ✓ | Final relation name. |
usecols | list | ✓ | List of the JSON parameters to ingest. The other ones will be ignored. If empty all the JSON parameters will be ingested. |
new_name | dict | ✓ | Dictionary of the new columns name. If the JSON file is nested, it is advised to change the final names as special characters will be included. For example, {"param": {"age": 3, "name": Badr}, "date": 1993-03-11} will create 3 columns: "param.age", "param.name" and "date". You can rename these columns using the 'new_name' parameter with the following dictionary: {"param.age": "age", "param.name": "name"} |
insert | bool | ✓ | If set to True, the data will be ingested to the input relation. The JSON parameters must be the same than the input relation otherwise they will not be ingested. |
start_point | str | ✓ | String, name of a key in the JSON load data at which to begin parsing. The parser ignores all data before the start_point value. The value is loaded for each object in the file. The parser processes data after the first instance, and up to the second, ignoring any remaining data. |
record_terminator | str | ✓ | When set, any invalid JSON records are skipped and parsing continues with the next record. Records must be terminated uniformly. For example, if your input file has JSON records terminated by newline characters, set this parameter to '\n'). If any invalid JSON records exist, parsing continues after the next record_terminator. Even if the data does not contain invalid records, specifying an explicit record terminator can improve load performance by allowing cooperative parse and apportioned load to operate more efficiently. When you omit this parameter, parsing ends at the first invalid JSON record. |
suppress_nonalphanumeric_key_chars | bool | ✓ | Boolean, whether to suppress non-alphanumeric characters in JSON key values. The parser replaces these characters with an underscore (_) when this parameter is true. |
reject_on_materialized_type_error | bool | ✓ | Boolean, whether to reject a data row that contains a materialized column value that cannot be coerced into a compatible data type. If the value is false and the type cannot be coerced, the parser sets the value in that column to null. If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types. |
reject_on_duplicate | bool | ✓ | Boolean, whether to ignore duplicate records (false), or to reject duplicates (true). In either case, the load continues. |
reject_on_empty_key | bool | ✓ | Boolean, whether to reject any row containing a field key without a value. |
flatten_maps | bool | ✓ | Boolean, whether to flatten all Avro maps. Key names are concatenated with nested levels. This value is recursive and affects all data in the load. |
flatten_arrays | bool | ✓ | Boolean, whether to convert lists to sub-maps with integer keys. When lists are flattened, key names are concatenated as for maps. Lists are not flattened by default. This value affects all data in the load, including nested lists. |
temporary_table | bool | ✓ | If set to True, a temporary table will be created. |
temporary_local_table | bool | ✓ | If set to True, a temporary local table will be created and the parameter 'schema' is ignored. |
gen_tmp_table_name | bool | ✓ | Sets the name of the temporary table. This parameter is only used when the parameter 'temporary_local_table' is set to True and if the parameters "table_name" and "schema" are unspecified. |
ingest_local | bool | ✓ | If set to True, the file will be ingested from the local machine. |
ingest_local | bool | ✓ | If set to True, the file will be ingested from the local machine. |
genSQL | bool | ✓ | If set to True, the SQL code for creating the final table is generated but not executed. This is a good way to change the final relation types or to customize the data ingestion. |
materialize | bool | ✓ | If set to True, the flex table is materialized into a table. Otherwise, it will remain a flex table. Flex tables simplify the data ingestion but have worse performace compared to regular tables. |
use_complex_dt | bool | ✓ | Boolean, whether the input data file has complex structure. When this is true, most of the other parameters will be ignored. |
Returns¶
vDataFrame : The vDataFrame of the relation.
Example¶
In [9]:
from verticapy.utilities import *
# Ingesting the JSON file
read_json("titanic.json",
table_name = "titanic_dataset",
schema = "public")
Out[9]:
In [11]:
# Ingesting the JSON file and renaming some columns
read_json("titanic.json",
table_name = "titanic_dataset",
schema = "public",
new_name = {"fields.fare": "fare",
"fields.sex": "sex"})
Out[11]:
In [14]:
# Ingesting the JSON file, using some columns and
# renaming some columns
read_json("titanic.json",
table_name = "titanic_dataset",
schema = "public",
usecols = ["fields.fare", "fields.sex"],
new_name = {"fields.fare": "fare",
"fields.sex": "sex"})