verticapy.vDataFrame.fillna#
- vDataFrame.fillna(val: dict | None = None, method: dict | None = None, numeric_only: bool = False) vDataFrame #
Fills missing elements in
vDataColumn
using specific rules.Parameters#
- val: dict, optional
Dictionary of values. The
dictionary
must be similar to the following:{"column1": val1 ..., "columnk": valk}
. Each key of thedictionary
must be avDataColumn
. The missing values of the inputvDataColumn
are replaced by the input value.- method: dict, optional
Method used to impute the missing values.
- auto:
Mean for the numerical and Mode for the categorical
vDataColumn
.
- mean:
Average.
- median:
Median.
- mode:
Mode (most occurent element).
- 0ifnull:
0 when the
vDataColumn
isNone
, 1 otherwise.
More Methods are available in the
vDataColumn.
fillna()
method.- numeric_only: bool, optional
If parameters ‘val’ and ‘method’ are empty and ‘numeric_only’ is set to
True
, all numericalvDataColumn
are imputed by their average. If set toFalse
, all categoricalvDataColumn
are also imputed by their mode.
Returns#
- vDataFrame
self
Examples#
We import
verticapy
:import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.For this example, we will use the Titanic dataset.
from verticapy.datasets import load_titanic data = load_titanic()
Note
VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.
We can see the count of each column to check if any column has missing values.
data.count()
count "pclass" 1234.0 "survived" 1234.0 "name" 1234.0 "sex" 1234.0 "age" 997.0 "sibsp" 1234.0 "parch" 1234.0 "ticket" 1234.0 "fare" 1233.0 "cabin" 286.0 "embarked" 1232.0 "boat" 439.0 "body" 118.0 "home.dest" 706.0 From the above table, we can see that the count of boats is less than 1234. This suggests that it is missing some values.
Now we can use the
fillna
method to fill those values. Let’s use a custom function to fill these values.data.fillna( val = {"boat": "No boat"}, method = { "age": "mean", "embarked": "mode", "fare": "median", } )
123pclassInt100%... 123survivedInt100%Abchome.destVarchar(100)57%1 1 ... 0 Montreal, PQ / Chesterville, ON 2 1 ... 0 Montreal, PQ / Chesterville, ON 3 1 ... 0 Montreal, PQ / Chesterville, ON 4 1 ... 0 Belfast, NI 5 1 ... 0 Montevideo, Uruguay 6 1 ... 0 New York, NY 7 1 ... 0 New York, NY 8 1 ... 0 Montreal, PQ 9 1 ... 0 Winnipeg, MN 10 1 ... 0 San Francisco, CA 11 1 ... 0 Trenton, NJ 12 1 ... 0 London / Winnipeg, MB 13 1 ... 0 Pomeroy, WA 14 1 ... 0 Omaha, NE 15 1 ... 0 Philadelphia, PA 16 1 ... 0 Washington, DC 17 1 ... 0 [null] 18 1 ... 0 New York, NY 19 1 ... 0 Montevideo, Uruguay 20 1 ... 0 Montevideo, Uruguay See also
vDataFrame.
interpolate()
: Fill missing values by interpolating.vDataColumn.
fill_outliers()
: Fill the outliers using the input method.