vDataFrame.balance¶
In [ ]:
vDataFrame.balance(column: str,
method: str = "hybrid",
x: float = 0.5,
order_by: list = [],)
Balances the dataset using the input method.
⚠ Warning: If the data is not sorted, the generated SQL code may differ between attempts.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
column | str | ❌ | Column used to compute the different categories. |
method | str | ✓ | The method with which to sample the data.
|
x | float | ✓ | The desired ratio between the majority class and minority classes. Only used when method is 'over' or 'under'. |
order_by | list | ✓ | vColumns used to sort the data. |
In [5]:
from verticapy.datasets import load_titanic
titanic = load_titanic()
# minority class is Q
titanic["embarked"].topk()
Out[5]:
In [7]:
# hybrid
balance = titanic.balance(column = "embarked")
balance["embarked"].topk()
Out[7]:
In [8]:
# over
balance = titanic.balance(column = "embarked", method = "over", x = 0.5)
balance["embarked"].topk()
Out[8]:
In [9]:
# under
balance = titanic.balance(column = "embarked", method = "under", x = 0.5)
balance["embarked"].topk()
Out[9]: