verticapy.vDataFrame.filter#
- vDataFrame.filter(conditions: None | list | str = None, *args, **kwargs) vDataFrame #
Filters the vDataFrame using the input expressions.
Parameters#
- conditions: SQLExpression, optional
List of expressions. For example, to keep only the records where the vDataColumn ‘column’ is greater than 5 and less than 10, you can write:
['"column" > 5', '"column" < 10']
.- force_filter: bool, optional
Default Value: True When set to True, the vDataFrame will be modified even if no filtering occurred. This parameter can be used to enforce filtering and ensure pipeline consistency.
- raise_error: bool, optional
Default Value: False If set to True and the input filtering is incorrect, an error is raised.
Returns#
- vDataFrame
self
Examples#
We import
verticapy
:import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.For this example, we will use the Titanic dataset:
from verticapy.datasets import load_titanic vdf = load_titanic()
Note
VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.
Using
filter
, we can create custom filters:vdf.filter("sex = 'female' AND pclass = 1")
123pclassInt100%... 123survivedInt100%Abchome.destVarchar(100)57%1 1 ... 0 Montreal, PQ / Chesterville, ON 2 1 ... 0 Montreal, PQ / Chesterville, ON 3 1 ... 0 Montreal, PQ / Chesterville, ON 4 1 ... 0 Belfast, NI 5 1 ... 0 Montevideo, Uruguay 6 1 ... 0 New York, NY 7 1 ... 0 New York, NY 8 1 ... 0 Montreal, PQ 9 1 ... 0 Winnipeg, MN 10 1 ... 0 San Francisco, CA 11 1 ... 0 Trenton, NJ 12 1 ... 0 London / Winnipeg, MB 13 1 ... 0 Pomeroy, WA 14 1 ... 0 Omaha, NE 15 1 ... 0 Philadelphia, PA 16 1 ... 0 Washington, DC 17 1 ... 0 [null] 18 1 ... 0 New York, NY 19 1 ... 0 Montevideo, Uruguay 20 1 ... 0 Montevideo, Uruguay Note
Similarly, the same can be done in a Pandas-like way:
vdf.filter((vdf["sex"] == "female") && (vdf["pclass"] == 1))
Or:
vdf = vdf[(vdf["sex"] == "female") && (vdf["pclass"] == 1)]
Warning
Ensure to use the
&&
operator and correctly place parentheses. Theand
operator is specific to Python, and its behavior cannot be changed.