read_file¶
In [ ]:
read_file(path: str,
schema: str = "",
table_name: str = "",
dtype: dict = {},
unknown: str = "varchar",
varchar_varbinary_length: int = 80,
insert: bool = False,
temporary_table: bool = False,
temporary_local_table: bool = True,
gen_tmp_table_name: bool = True,
ingest_local: bool = False,
genSQL: bool = False,
max_files: int = 100,)
Inspects and ingests a file in CSV, Parquet, ORC, JSON, or Avro format. This function uses the Vertica complex data type. For new table creation, the file must be located in the server.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
path | str | ❌ | Path to a file or glob. Valid paths include any path that is valid for COPY and that uses a file format supported by this function. When inferring the data type, only one file will be read, even if a glob specifies multiple files. However, in the case of JSON, more than one file may be read to infer the data type. |
schema | str | ✓ | Schema in which to create the table. |
table_name | str | ✓ | Name of the table to create. If empty, the file name is used. |
dtype | dict | ✓ | Dictionary of customised data type. The predicted data types will be replaced by the input data types. The dictionary must include the name of the column as key and the new data type as value. |
unknown | str | ✓ | Type used to replace unknown data types. |
varchar_varbinary_length | int | ✓ | Default length of varchar and varbinary columns. |
insert | bool | ✓ | If set to True, the data is ingested into the input relation. When you set this parameter to True, most of the parameters are ignored. |
temporary_table | bool | ✓ | If set to True, a temporary table will be created. |
temporary_local_table | bool | ✓ | If set to True, a temporary local table will be created. The parameter 'schema' must to be empty, otherwise this parameter is ignored. |
gen_tmp_table_name | bool | ✓ | Sets the name of the temporary table. This parameter is only used when the parameter 'temporary_local_table' is set to True and the parameters "table_name" and "schema" are unspecified. |
ingest_local | bool | ✓ | If set to True, the file is ingested from the local machine. This currently only works for data insertion. |
genSQL | bool | ✓ | If set to True, the SQL code for creating the final table is generated but not executed. This is a good way to change the final relation types or to customize the data ingestion. |
max_files | int | ✓ | (JSON only.) If path is a glob, specifies maximum number of files in path to inspect. Use this parameter to increase the amount of data the function considers. This can be beneficial if you suspect variation among files. Files are chosen arbitrarily from the glob. |
Returns¶
vDataFrame : The vDataFrame of the relation.
Example¶
In [3]:
from verticapy.utilities import read_file
# Generates the SQL needed to create the Table
read_file("laliga/*.json",
table_name = "laliga",
schema = "public",
genSQL = True)
Out[3]:
In [5]:
# Generates the SQL needed to create the Table + Replacing some elements
read_file("laliga/*.json",
table_name = "laliga",
schema = "public",
genSQL = True,
dtype = {"competition_name": "varchar(30)",
"home_team_id": "float"},
varchar_varbinary_length = 50)
Out[5]:
In [8]:
# Ingesting the file
read_file("laliga/*.json",
table_name = "laliga",
schema = "public",
dtype = {"competition_name": "varchar(30)",
"home_team_id": "float"},
varchar_varbinary_length = 50)
Out[8]: