pcsv¶
In [ ]:
pcsv(path: str,
sep: str = ",",
header: bool = True,
header_names: list = [],
na_rep: str = "",
quotechar: str = '"',
escape: str = "\027",
record_terminator: str = "\n",
trim: bool = True,
omit_empty_keys: bool = False,
reject_on_duplicate: bool = False,
reject_on_empty_key: bool = False,
reject_on_materialized_type_error: bool = False,
ingest_local: bool = True,
flex_name: str = "",
genSQL: bool = False,)
Parses a CSV file using flex tables. It will identify the columns and their respective types.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
path | str | ❌ | Absolute path where the CSV file is located. |
sep | str | ✓ | Column separator. |
header | bool | ✓ | If set to False, the parameter 'header_names' will be to use to name the different columns. |
header_names | list | ✓ | List of the columns names. |
na_rep | str | ✓ | Missing values representation. |
quotechar | str | ✓ | Char which is enclosing the str values. |
escape | str | ✓ | Separator between each record. |
record_terminator | str | ✓ | A single-character value used to specify the end of a record. |
trim | bool | ✓ | Boolean, specifies whether to trim white space from header names and key values. |
omit_empty_keys | bool | ✓ | Boolean, specifies how the parser handles header keys without values. If true, keys with an empty value in the header row are not loaded. |
reject_on_duplicate | bool | ✓ | Boolean, specifies whether to ignore duplicate records (False), or to reject duplicates (True). In either case, the load continues. |
reject_on_empty_key | bool | ✓ | Boolean, specifies whether to reject any row containing a key without a value. |
reject_on_materialized_type_error | bool | ✓ | Boolean, specifies whether to reject any materialized column value that the parser cannot coerce into a compatible data type. |
ingest_local | bool | ✓ | If set to True, the file will be ingested from the local machine. |
flex_name | str | ✓ | Flex table name. |
genSQL | bool | ✓ | If set to True, the SQL code for creating the final table is generated but not executed. This is a good way to change the final relation types or to customize the data ingestion. |
Returns¶
dict : dictionary containing for each column its type.
Example¶
In [32]:
from verticapy.utilities import *
# Parses the CSV file
pcsv("titanic.csv",
sep = ",",
na_rep = "")
Out[32]:
In [35]:
# You can also rename the columns or name them if it has
# no header by using the parameter 'header_names'
pcsv("titanic.csv",
sep = ",",
na_rep = "",
header = True,
header_names = ["new_name1", "new_name2"])
Out[35]: