vDataFrame¶
In [ ]:
vDataFrame(input_relation = "",
columns: list = [],
usecols: list = [],
schema: str = "",
sql: str = "",
external: bool = False,
symbol: str = "$",
sql_push_ext: bool = True,
empty: bool = False)
The vDataFrame is a Python object that allows you to prepare and explore your data without modifying it. When you make "changes" to your data, the vDataFrame records your modifications as SQL queries and sends them to your Vertica database which then aggregates and returns the final result. For each column of the relation, the vDataFrame creates a virtual column (vColumn) that stores the column's alias and all user modifications to the column.

Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
input_relation | str / tablesample / pandas.DataFrame / list / numpy.ndarray / dict | ✓ | If the input_relation is of type str, it must represent the relation (view, table, or temporary table) used to create the object. To get a specific schema relation, your string must include both the relation and schema: 'schema.relation' or '"schema"."relation"'. Alternatively, you can use the 'schema' parameter, in which case the input_relation must exclude the schema name. If it is a pandas.DataFrame, a temporary local table is created. Otherwise, the vDataFrame is created using the generated SQL code of multiple UNIONs. |
columns | list | ✓ | List of column names. Only used when input_relation is an array-like type. |
usecols | list | ✓ | List of columns to use to create the object. Specifying fewer columns speeds up object creation. |
schema | str | ✓ | The schema of the relation. Specifying a schema allows you to specify a table within a particular schema, or a schema and relation name that contain period '.' characters. If specified, the input_relation parameter must exclude the schema. |
sql | str | ✓ | A SQL query from which to create the vDataFrame. If specified, the parameter 'input_relation' must be empty. |
external | bool | ✓ | A boolean to indicate whether it is an external table. If set to True, a Connection Identifier Database must be defined. See the connect.set_external_connection function for more information. |
symbol | str | ✓ | Symbol used to identify the external connection. See the connect.set_external_connection function for more information. |
sql_push_ext | bool | ✓ | If set to True, the external vDataFrame attempts to push the entire query to the external table (only DQL statements - SELECT; for other statements, use SQL Magic directly). This can increase performance but might increase the error rate. For instance, some DBs might not support the same SQL as Vertica. |
empty | bool | ✓ | If True, this function creates an empty vDataFrame. You can use this to create a custom vDataFrame, bypassing initialization checks. |
Attributes¶
Name | Type | Description |
---|---|---|
_VERTICAPY_VARIABLES_ | dict | Dictionary containing all the vDataFrame attributes.
|
vcolumns | vcolumn | Each vColumn of the vDataFrame is accessible by specifying its name between brackets. For example to access to "myVC", you can write vDataFrame["myVC"]. |
Example¶
This example creates a vDataFrame from the sample 'titanic' dataset. Start by loading the dataset:
In [1]:
from verticapy.datasets import load_titanic
titanic = load_titanic(name = "titanic", schema = "public")
You can then create a vDataFrame from the dataset in the following ways:
In [2]:
from verticapy import vDataFrame
# Create a vDataFrame using the schema and the relation name
# in the 'input_relation' parameter
vDataFrame(input_relation = '"public"."titanic"')
Out[2]:
In [6]:
# Create a vDataFrame using the schema and the relation name
vDataFrame(input_relation = 'titanic', schema = 'public')
Out[6]:
In [7]:
# Create a vDataFrame using a set of input vColumns
vDataFrame(input_relation = 'titanic', schema = 'public', usecols = ["age", "survived"])
Out[7]:
In [8]:
# Creating a vDataFrame using a SQL query
vDataFrame(sql = "SELECT age, fare FROM public.titanic;")
Out[8]: