read_csv¶
In [ ]:
read_csv(path: str,
schema: str = "",
table_name: str = "",
sep: str = "",
header: bool = True,
header_names: list = [],
dtype: dict = {},
na_rep: str = "",
quotechar: str = '"',
escape: str = "\027",
record_terminator: str = "\n",
trim: bool = True,
omit_empty_keys: bool = False,
reject_on_duplicate: bool = False,
reject_on_empty_key: bool = False,
reject_on_materialized_type_error: bool = False,
parse_nrows: int = -1,
insert: bool = False,
temporary_table: bool = False,
temporary_local_table: bool = True,
gen_tmp_table_name: bool = True,
ingest_local: bool = True,
genSQL: bool = False,
materialize: bool = True,)
Ingests a CSV file using flex tables.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
path | str | ❌ | Absolute path where the CSV file is located. |
schema | str | ✓ | Schema where the CSV file will be ingested. |
table_name | str | ✓ | Final relation name. |
sep | str | ✓ | Column separator. If empty, the separator is guessed. This is only possible if the files are not compressed. |
header | bool | ✓ | If set to False, the parameter 'header_names' will be to use to name the different columns. |
header_names | list | ✓ | List of the columns names. |
dtype | dict | ✓ | Dictionary of the user types. Providing a dictionary can increase ingestion speed and precision. If specified, VerticaPy uses the designated input types instead of parsing the file to infer the input types. |
na_rep | str | ✓ | Missing values representation. |
quotechar | str | ✓ | Char which is enclosing the str values. |
escape | str | ✓ | Separator between each record. |
record_terminator | str | ✓ | A single-character value used to specify the end of a record. |
trim | bool | ✓ | Boolean, specifies whether to trim white space from header names and key values. |
omit_empty_keys | bool | ✓ | Boolean, specifies how the parser handles header keys without values. If true, keys with an empty value in the header row are not loaded. |
reject_on_duplicate | bool | ✓ | Boolean, specifies whether to ignore duplicate records (False), or to reject duplicates (True). In either case, the load continues. |
reject_on_empty_key | bool | ✓ | Boolean, specifies whether to reject any row containing a key without a value. |
reject_on_materialized_type_error | bool | ✓ | Boolean, specifies whether to reject any materialized column value that the parser cannot coerce into a compatible data type. |
parse_nrows | int | ✓ | If this parameter is greater than 0. A new file of 'parse_nrows' lines will be created and ingested first to identify the data types. It will be then dropped and the entire file will be ingested. The data types identification will be less precise but this parameter can make the process faster if the file is heavy. |
insert | bool | ✓ | If set to True, the data will be ingested to the input relation. Be sure that your file has a header corresponding to the name of the relation columns otherwise the ingestion will not work. |
temporary_table | bool | ✓ | If set to True, a temporary table will be created. |
temporary_local_table | bool | ✓ | If set to True, a temporary local table will be created. The parameter 'schema' must to be empty, otherwise this parameter is ignored. |
gen_tmp_table_name | bool | ✓ | Sets the name of the temporary table. This parameter is only used when the parameter 'temporary_local_table' is set to True and if the parameters "table_name" and "schema" are unspecified. |
ingest_local | bool | ✓ | If set to True, the file will be ingested from the local machine. |
genSQL | bool | ✓ | If set to True, the SQL code to use to create the final table will be generated but not executed. It is a good way to change the final relation types or to customize the data ingestion. |
materialize | bool | ✓ | If set to True, the flex table is materialized into a table. Otherwise, it will remain a flex table. Flex tables simplify the data ingestion but have worse performace compared to regular tables. |
Returns¶
vDataFrame : The vDataFrame of the relation.
Example¶
In [41]:
from verticapy.utilities import *
# Generates the SQL needed to create the Table
read_csv("titanic.csv",
table_name = "titanic_dataset",
schema = "public",
quotechar = '"',
sep = ",",
na_rep = "",
genSQL = True)
Out[41]:
In [42]:
# Generates the SQL needed to create the Table & only parses 100 lines
read_csv("titanic.csv",
table_name = "titanic_dataset",
schema = "public",
quotechar = '"',
sep = ",",
na_rep = "",
parse_nrows = 100,
genSQL = True)
Out[42]:
In [43]:
# Ingests the CSV file
read_csv("titanic.csv",
table_name = "titanic_dataset",
schema = "public",
quotechar = '"',
sep = ",",
na_rep = "",
parse_nrows = 100)
Out[43]:
In [44]:
# Inserts a new file into an existing table
read_csv("titanic.csv",
table_name = "titanic_dataset",
schema = "public",
quotechar = '"',
sep = ",",
na_rep = "",
parse_nrows = 100,
insert = True)
Out[44]:
In [47]:
# Creates a new table using the parameter dtype
read_csv("titanic.csv",
table_name = "titanic_dataset",
schema = "public",
quotechar = '"',
sep = ",",
na_rep = "",
dtype = {"pclass": "Integer",
"survived": "Integer",
"name": "Varchar(130)",
"sex": "Varchar(20)",
"age": "Numeric(5,3)",
"sibsp": "Integer",
"parch": "Integer",
"ticket": "Varchar(22)",
"fare": "Numeric(10,5)",
"cabin": "Varchar(22)",
"embarked": "Varchar(20)",
"boat": "Varchar(100)",
"body": "Integer",
"home.dest": "Varchar(92)",},)
Out[47]: