pcsv

In [ ]:
pcsv(path: str,
     sep: str = ",",
     header: bool = True,
     header_names: list = [],
     na_rep: str = "",
     quotechar: str = '"',
     escape: str = "\027",
     record_terminator: str = "\n",
     trim: bool = True,
     omit_empty_keys: bool = False,
     reject_on_duplicate: bool = False,
     reject_on_empty_key: bool = False,
     reject_on_materialized_type_error: bool = False,
     ingest_local: bool = True,
     flex_name: str = "",
     genSQL: bool = False,)

Parses a CSV file using flex tables. It will identify the columns and their respective types.

Parameters

Name Type Optional Description
path
str
Absolute path where the CSV file is located.
sep
str
Column separator.
header
bool
If set to False, the parameter 'header_names' will be to use to name the different columns.
header_names
list
List of the columns names.
na_rep
str
Missing values representation.
quotechar
str
Char which is enclosing the str values.
escape
str
Separator between each record.
record_terminator
str
A single-character value used to specify the end of a record.
trim
bool
Boolean, specifies whether to trim white space from header names and key values.
omit_empty_keys
bool
Boolean, specifies how the parser handles header keys without values. If true, keys with an empty value in the header row are not loaded.
reject_on_duplicate
bool
Boolean, specifies whether to ignore duplicate records (False), or to reject duplicates (True). In either case, the load continues.
reject_on_empty_key
bool
Boolean, specifies whether to reject any row containing a key without a value.
reject_on_materialized_type_error
bool
Boolean, specifies whether to reject any materialized column value that the parser cannot coerce into a compatible data type.
ingest_local
bool
If set to True, the file will be ingested from the local machine.
flex_name
str
Flex table name.
genSQL
bool
If set to True, the SQL code for creating the final table is generated but not executed. This is a good way to change the final relation types or to customize the data ingestion.

Returns

dict : dictionary containing for each column its type.

Example

In [32]:
from verticapy.utilities import *
# Parses the CSV file 
pcsv("titanic.csv", 
     sep = ",",
     na_rep = "")
Out[32]:
{'age': 'Numeric(6,3)',
 'boat': 'Varchar(100)',
 'body': 'Integer',
 'cabin': 'Varchar(30)',
 'embarked': 'Varchar(20)',
 'fare': 'Numeric(10,5)',
 'home.dest': 'Varchar(100)',
 'name': 'Varchar(164)',
 'parch': 'Integer',
 'pclass': 'Integer',
 'sex': 'Varchar(20)',
 'sibsp': 'Integer',
 'survived': 'Integer',
 'ticket': 'Varchar(36)'}
In [35]:
# You can also rename the columns or name them if it has 
# no header by using the parameter 'header_names'
pcsv("titanic.csv", 
     sep = ",",
     na_rep = "",
     header = True,
     header_names = ["new_name1", "new_name2"])
Out[35]:
{'new_name1': 'Integer',
 'new_name2': 'Integer',
 'ucol10': 'Varchar(20)',
 'ucol11': 'Varchar(100)',
 'ucol12': 'Integer',
 'ucol13': 'Varchar(100)',
 'ucol2': 'Varchar(164)',
 'ucol3': 'Varchar(20)',
 'ucol4': 'Numeric(6,3)',
 'ucol5': 'Integer',
 'ucol6': 'Integer',
 'ucol7': 'Varchar(36)',
 'ucol8': 'Numeric(10,5)',
 'ucol9': 'Varchar(30)'}

See Also

read_csv Ingests a CSV file in the Vertica DB.
read_json Ingests a JSON file in the Vertica DB.