pandas_to_vertica¶
In [ ]:
pandas_to_vertica(df,
name: str = "",
schema: str = "",
dtype: dict = {},
parse_nrows: int = 10000,
temp_path: str = "",
insert: bool = False,)
Ingests a pandas DataFrame into the Vertica database by creating a CSV file and then using flex tables.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
df | pandas.DataFrame | ❌ | The pandas.DataFrame to ingest. |
name | str | ✓ | Name of the new relation, or an existing relation, in which to insert the data. If unspecified, the temporary schema will be used. To change the default relation, use the set_option function. |
schema | str | ✓ | Schema of the new relation. If unspecified, a temporary table is created. |
dtype | dict | ✓ | Dictionary of input types. Providing a dictionary can increase ingestion speed and precision. If specified, VerticaPy uses the designated input types instead of parsing the intermediate CSV and inferring the input types. |
parse_nrows | int | ✓ | If this parameter is greater than 0, VerticaPy creates and ingests a temporary file containing 'parse_nrows' lines to determine the input data types. VerticaPy then ingests the remainder of the intermediate CSV file containing the rest of the data. This method of data type identification is less accurate, but much faster for large datasets. |
temp_path | str | ✓ | The path to which the intermediate CSV file is written. This is useful in cases where the user does not have write permissions on the current directory. |
insert | bool | ✓ | If set to True, the data is ingested into the input relation. The column names of your table and the pandas.DataFrame must match. |
Returns¶
vDataFrame : vDataFrame of the new relation.
Example¶
In [19]:
from verticapy.datasets import load_titanic
titanic = load_titanic()
df = titanic.to_pandas()
df
Out[19]:
In [3]:
from verticapy.utilities import *
pandas_to_vertica(df = df, name = "titanic_pandas", schema = "public")
Out[3]:
In [20]:
# using the dtype parameter
pandas_to_vertica(df = df,
name = "titanic_pandas",
schema = "public",
dtype = {"pclass": "Integer",
"survived": "Integer",
"name": "Varchar(164)",
"sex": "Varchar(20)",
"age": "Numeric(6,3)",
"sibsp": "Integer",
"parch": "Integer",
"ticket": "Varchar(36)",
"fare": "Numeric(10,5)",
"cabin": "Varchar(30)",
"embarked": "Varchar(20)",
"boat": "Varchar(100)",
"body": "Integer",
"home.dest": "Varchar(100)"})
Out[20]: