vDataFrame[].fillna

In [ ]:
vDataFrame[].fillna(val = None,
                    method: str = "auto",
                    expr: str = "",
                    by: list = [],
                    order_by: list = [])

Fills the vcolumn missing elements using specific rules.

Parameters

Name Type Optional Description
val
int / float / str
Value to use to impute the vcolumn.
method
dict
Method to use to impute the missing values.
  • auto : Mean for the numerical and Mode for the categorical vcolumns.
  • bfill : Back Propagation of the next element (Constant Interpolation).
  • ffill : Propagation of the first element (Constant Interpolation).
  • mean : Average.
  • median : Median.
  • mode : Mode (most occurent element).
  • 0ifnull : 0 when the vcolumn is null, 1 otherwise.
expr
str
SQL expression.
by
list
vcolumns used in the partition.
order_by
list
List of the vcolumns to use to sort the data when using TS methods.

Returns

vDataFrame : self.parent

Example

In [72]:
from verticapy.datasets import load_titanic
titanic = load_titanic()
display(titanic)
123
fare
Numeric(10,5)
123
survived
Int
Abc
sex
Varchar(20)
Abc
boat
Varchar(100)
123
pclass
Int
123
age
Numeric(6,3)
Abc
ticket
Varchar(36)
Abc
Varchar(164)
Abc
embarked
Varchar(20)
Abc
cabin
Varchar(30)
123
body
Int
123
parch
Int
Abc
home.dest
Varchar(100)
123
sibsp
Int
1151.550000female[null]12.000113781SC22 C26[null]2Montreal, PQ / Chesterville, ON1
2151.550000male[null]130.000113781SC22 C261352Montreal, PQ / Chesterville, ON1
3151.550000female[null]125.000113781SC22 C26[null]2Montreal, PQ / Chesterville, ON1
40.000000male[null]139.000112050SA36[null]0Belfast, NI0
549.504200male[null]171.000PC 17609C[null]220Montevideo, Uruguay0
6227.525000male[null]147.000PC 17757CC62 C641240New York, NY1
725.925000male[null]1[null]PC 17318S[null][null]0New York, NY0
8247.520800male[null]124.000PC 17558CB58 B60[null]1Montreal, PQ0
975.241700maleA136.00013050CC6[null]0Winnipeg, MN0
1026.000000male[null]125.00013905C[null]1480San Francisco, CA0
1135.500000male[null]145.000113784ST[null]0Trenton, NJ0
1226.550000male[null]142.000110489SD22[null]0London / Winnipeg, MB0
1330.500000male[null]141.000113054SA21[null]0Pomeroy, WA0
1450.495800male[null]148.000PC 17591CB102080Omaha, NE0
1539.600000male[null]1[null]112379C[null][null]0Philadelphia, PA0
1626.550000male[null]145.000113050SB38[null]0Washington, DC0
1731.000000male[null]1[null]113798S[null][null]0[null]0
185.000000male[null]133.000695SB51 B53 B55[null]0New York, NY0
1947.100000male[null]128.000113059S[null][null]0Montevideo, Uruguay0
2047.100000male[null]117.000113059S[null][null]0Montevideo, Uruguay0
2126.000000male[null]149.00019924S[null][null]0Ascot, Berkshire / Rochester, NY0
2278.850000male[null]136.00019877SC461720Little Onn Hall, Staffs1
2361.175000male[null]146.000W.E.P. 5734SE31[null]0Amenia, ND1
240.000000male[null]1[null]112051S[null][null]0Liverpool, England / Belfast0
25136.779200male[null]127.00013508CC89[null]0Los Angeles, CA1
2652.000000male[null]1[null]110465SA14[null]0Stoughton, MA0
2725.587500male[null]147.0005727SE58[null]0Victoria, BC0
2883.158300male[null]137.000PC 17756CE52[null]1Lakewood, NJ1
2926.550000male[null]1[null]113791S[null][null]0Roachdale, IN0
3071.000000male[null]170.000WE/P 5735SB222691Milwaukee, WI1
3171.283300male[null]139.000PC 17599CC85[null]0New York, NY1
3252.000000male[null]131.000F.C. 12750SB71[null]0Montreal, PQ1
33106.425000male[null]150.000PC 17761CC86620Deephaven, MN / Cedar Rapids, IA1
3429.700000male[null]139.000PC 17580CA181330Philadelphia, PA0
3531.679200female[null]136.000PC 17531CA29[null]0New York, NY0
36221.779200male[null]1[null]PC 17483SC95[null]0[null]0
3727.750000male[null]130.000113051CC111[null]0New York, NY0
38263.000000male[null]119.00019950SC23 C25 C27[null]2Winnipeg, MB3
39263.000000male[null]164.00019950SC23 C25 C27[null]4Winnipeg, MB1
4026.550000male[null]1[null]113778SD34[null]0Westcliff-on-Sea, Essex0
410.000000male[null]1[null]112058SB102[null]0[null]0
4253.100000male[null]137.000113803SC123[null]0Scituate, MA1
4338.500000male[null]147.000111320SE632750St Anne's-on-Sea, Lancashire0
4479.200000male[null]124.000PC 17593CB86[null]0[null]0
4534.654200male[null]171.000PC 17754CA5[null]0New York, NY0
46153.462500male[null]138.000PC 17582SC911471Winnipeg, MB0
4779.200000male[null]146.000PC 17593CB82 B84[null]0New York, NY0
4842.400000male[null]1[null]113796S[null][null]0[null]0
4983.475000male[null]145.00036973SC83[null]0New York, NY1
500.000000male[null]140.000112059SB941100[null]0
5193.500000male[null]155.00012749SB693071Montreal, PQ1
5242.500000male[null]142.000113038SB11[null]0London / Middlesex0
5351.862500male[null]1[null]17463SE46[null]0Brighton, MA0
5450.000000male[null]155.000680SC39[null]0London / Birmingham0
5552.000000male[null]142.000113789S[null]380New York, NY1
5630.695800male141[null]PC 17600C[null][null]0New York, NY0
5728.712500female[null]150.000PC 17595CC49[null]0Paris, France New York, NY0
5826.000000male[null]146.000694S[null]800Bennington, VT0
5926.000000male[null]150.000113044SE60[null]0London0
60211.500000male[null]132.500113503CC132450[null]0
6129.700000male[null]158.00011771CB372580Buffalo, NY0
6251.862500male[null]141.00017464SD21[null]0Southington / Noank, CT1
6326.550000male[null]1[null]113028SC124[null]0Portland, OR0
6427.720800male[null]1[null]PC 17612C[null][null]0Chicago, IL0
6530.000000male[null]129.000113501SD61260Springfield, MA0
6645.500000male[null]130.000113801S[null][null]0London / New York, NY0
6726.000000male[null]130.000110469SC106[null]0Brockton, MA0
6853.100000male[null]119.000113773SD30[null]0New York, NY1
6975.241700male[null]146.00013050CC62920Vancouver, BC0
7051.862500male[null]154.00017463SE461750Dorchester, MA0
7182.170800male[null]128.000PC 17604C[null][null]0New York, NY1
7226.550000male[null]165.00013509SE382490East Bridgewater, MA0
7390.000000male[null]144.00019928QC782300Fond du Lac, WI2
7430.500000male[null]155.000113787SC30[null]0Montreal, PQ0
7542.400000male[null]147.000113796S[null][null]0Washington, DC0
7629.700000male[null]137.000PC 17596CC118[null]1Brooklyn, NY0
77113.275000male[null]158.00035273CD481222Lexington, MA0
7826.000000male[null]164.000693S[null]2630Isle of Wight, England0
7961.979200male[null]165.000113509CB302341Providence, RI0
8027.720800male[null]128.500PC 17562CD431890?Havana, Cuba0
810.000000male[null]1[null]112052S[null][null]0Belfast0
8228.500000male[null]145.500113043SC1241660Surbiton Hill, Surrey0
8393.500000male[null]123.00012749SB24[null]0Montreal, PQ0
8466.600000male[null]129.000113776SC2[null]0Isleworth, England1
85108.900000male[null]118.000PC 17758CC65[null]0Madrid, Spain1
8652.000000male[null]147.000110465SC1102070Worcester, MA0
870.000000male[null]138.00019972S[null][null]0Rotterdam, Netherlands0
88135.633300male[null]122.000PC 17760C[null]2320[null]0
89227.525000male[null]1[null]PC 17757C[null][null]0[null]0
9050.495800male[null]131.000PC 17590SA24[null]0Trenton, NJ0
9150.000000male[null]1[null]113767SA32[null]0Seattle, WA0
9240.125000male[null]136.00013049CA10[null]0Winnipeg, MB0
9359.400000male[null]155.000PC 17603C[null][null]0New York, NY1
9426.550000male[null]133.000113790S[null]1090London0
95262.375000male[null]161.000PC 17608CB57 B59 B63 B66[null]3Haverford, PA / Cooperstown, NY1
9655.900000male[null]150.00013507SE44[null]0Duluth, MN1
9726.550000male[null]156.000113792S[null][null]0New York, NY0
9830.695800male[null]156.00017764CA7[null]0St James, Long Island, NY0
9960.000000male[null]124.00013695SC31[null]0Huntington, WV1
10026.000000male[null]1[null]113056SA19[null]0Streatham, Surrey0
Rows: 1-100 of 1234 | Columns: 14
In [73]:
display(titanic["age"])
# Imputation using the avg over partition by pclass and sex
titanic["age"].fillna(method = "avg", by = ["pclass", "sex"])
123
age
Numeric(6,3)
12.000
230.000
325.000
439.000
571.000
647.000
7[null]
824.000
936.000
1025.000
1145.000
1242.000
1341.000
1448.000
15[null]
1645.000
17[null]
1833.000
1928.000
2017.000
2149.000
2236.000
2346.000
24[null]
2527.000
26[null]
2747.000
2837.000
29[null]
3070.000
3139.000
3231.000
3350.000
3439.000
3536.000
36[null]
3730.000
3819.000
3964.000
40[null]
41[null]
4237.000
4347.000
4424.000
4571.000
4638.000
4746.000
48[null]
4945.000
5040.000
5155.000
5242.000
53[null]
5455.000
5542.000
56[null]
5750.000
5846.000
5950.000
6032.500
6158.000
6241.000
63[null]
64[null]
6529.000
6630.000
6730.000
6819.000
6946.000
7054.000
7128.000
7265.000
7344.000
7455.000
7547.000
7637.000
7758.000
7864.000
7965.000
8028.500
81[null]
8245.500
8323.000
8429.000
8518.000
8647.000
8738.000
8822.000
89[null]
9031.000
91[null]
9236.000
9355.000
9433.000
9561.000
9650.000
9756.000
9856.000
9924.000
100[null]
Rows: 1-100 of 1234 | Column: age | Type: numeric(6,3)
237 element(s) was/were filled
123
age
Numeric(6,3)
136.0
231.0
321.0
450.0
545.0
631.0
755.0
860.0
935.0
1037.2635658914729
1139.0
1237.2635658914729
1339.0
1418.0
1548.0
1662.0
1752.0
1843.0
1937.2635658914729
2040.0
2123.0
2218.0
2339.0
2440.0
2530.0
2635.0
2724.0
2848.0
2918.0
3054.0
3133.0
3233.0
3343.0
3456.0
3530.0
3617.0
3722.0
3822.0
3939.0
4019.0
4123.0
4231.0
4337.0
4433.0
4537.2635658914729
4618.0
4716.0
4815.0
4958.0
5021.0
5151.0
5216.0
5355.0
5449.0
5530.0
5639.0
5745.0
5837.2635658914729
5938.0
6035.0
6135.0
6251.0
6344.0
6416.0
6552.0
6624.0
6735.0
6849.0
6925.0
7045.0
7158.0
7219.0
7337.2635658914729
7445.0
7522.0
7635.0
7735.0
7848.0
7922.0
8037.2635658914729
8130.0
8260.0
8323.0
8428.0
8524.0
8637.2635658914729
8737.2635658914729
8854.0
8938.0
9023.0
9148.0
9227.0
9354.0
9417.0
9527.0
9633.0
9738.0
9864.0
9936.0
10055.0
Out[73]:
Rows: 1-100 of 1234 | Column: age | Type: numeric(6,3)
In [74]:
display(titanic["age"])
# Imputation using the median over partition by pclass and sex
titanic["age"].fillna(method = "median", by = ["pclass", "sex"])
123
age
Numeric(6,3)
136.0
231.0
321.0
450.0
545.0
631.0
755.0
860.0
935.0
1037.2635658914729
1139.0
1237.2635658914729
1339.0
1418.0
1548.0
1662.0
1752.0
1843.0
1937.2635658914729
2040.0
2123.0
2218.0
2339.0
2440.0
2530.0
2635.0
2724.0
2848.0
2918.0
3054.0
3133.0
3233.0
3343.0
3456.0
3530.0
3617.0
3722.0
3822.0
3939.0
4019.0
4123.0
4231.0
4337.0
4433.0
4537.2635658914729
4618.0
4716.0
4815.0
4958.0
5021.0
5151.0
5216.0
5355.0
5449.0
5530.0
5639.0
5745.0
5837.2635658914729
5938.0
6035.0
6135.0
6251.0
6344.0
6416.0
6552.0
6624.0
6735.0
6849.0
6925.0
7045.0
7158.0
7219.0
7337.2635658914729
7445.0
7522.0
7635.0
7735.0
7848.0
7922.0
8037.2635658914729
8130.0
8260.0
8323.0
8428.0
8524.0
8637.2635658914729
8737.2635658914729
8854.0
8938.0
9023.0
9148.0
9227.0
9354.0
9417.0
9527.0
9633.0
9738.0
9864.0
9936.0
10055.0
Rows: 1-100 of 1234 | Column: age | Type: numeric(6,3)
Nothing was filled
123
age
Numeric(6,3)
136.0
231.0
321.0
450.0
545.0
631.0
755.0
860.0
935.0
1037.2635658914729
1139.0
1237.2635658914729
1339.0
1418.0
1548.0
1662.0
1752.0
1843.0
1937.2635658914729
2040.0
2123.0
2218.0
2339.0
2440.0
2530.0
2635.0
2724.0
2848.0
2918.0
3054.0
3133.0
3233.0
3343.0
3456.0
3530.0
3617.0
3722.0
3822.0
3939.0
4019.0
4123.0
4231.0
4337.0
4433.0
4537.2635658914729
4618.0
4716.0
4815.0
4958.0
5021.0
5151.0
5216.0
5355.0
5449.0
5530.0
5639.0
5745.0
5837.2635658914729
5938.0
6035.0
6135.0
6251.0
6344.0
6416.0
6552.0
6624.0
6735.0
6849.0
6925.0
7045.0
7158.0
7219.0
7337.2635658914729
7445.0
7522.0
7635.0
7735.0
7848.0
7922.0
8037.2635658914729
8130.0
8260.0
8323.0
8428.0
8524.0
8637.2635658914729
8737.2635658914729
8854.0
8938.0
9023.0
9148.0
9227.0
9354.0
9417.0
9527.0
9633.0
9738.0
9864.0
9936.0
10055.0
Out[74]:
Rows: 1-100 of 1234 | Column: age | Type: numeric(6,3)
In [75]:
display(titanic["embarked"])
# Imputation using the mode
titanic["embarked"].fillna(method = "mode")
Abc
embarked
Varchar(20)
1S
2S
3S
4S
5C
6C
7S
8C
9C
10C
11S
12S
13S
14C
15C
16S
17S
18S
19S
20S
21S
22S
23S
24S
25C
26S
27S
28C
29S
30S
31C
32S
33C
34C
35C
36S
37C
38S
39S
40S
41S
42S
43S
44C
45C
46S
47C
48S
49S
50S
51S
52S
53S
54S
55S
56C
57C
58S
59S
60C
61C
62S
63S
64C
65S
66S
67S
68S
69C
70S
71C
72S
73Q
74S
75S
76C
77C
78S
79C
80C
81S
82S
83S
84S
85C
86S
87S
88C
89C
90S
91S
92C
93C
94S
95C
96S
97S
98C
99S
100S
Rows: 1-100 of 1234 | Column: embarked | Type: varchar(20)
2 element(s) was/were filled
Abc
embarked
Varchar(20)
1S
2S
3S
4S
5C
6C
7S
8C
9C
10C
11S
12S
13S
14C
15C
16S
17S
18S
19S
20S
21S
22S
23S
24S
25C
26S
27S
28C
29S
30S
31C
32S
33C
34C
35C
36S
37C
38S
39S
40S
41S
42S
43S
44C
45C
46S
47C
48S
49S
50S
51S
52S
53S
54S
55S
56C
57C
58S
59S
60C
61C
62S
63S
64C
65S
66S
67S
68S
69C
70S
71C
72S
73Q
74S
75S
76C
77C
78S
79C
80C
81S
82S
83S
84S
85C
86S
87S
88C
89C
90S
91S
92C
93C
94S
95C
96S
97S
98C
99S
100S
Out[75]:
Rows: 1-100 of 1234 | Column: embarked | Type: varchar(20)
In [76]:
display(titanic["boat"])
# As the missing values are the most occurent categories, an idea
# could be to merge all the non-null category (category 1) and to
# impute the missing values par 0. '0ifnull' method is following this
# process
titanic["boat"].fillna(method = "0ifnull")
Abc
boat
Varchar(100)
1[null]
2[null]
3[null]
4[null]
5[null]
6[null]
7[null]
8[null]
9A
10[null]
11[null]
12[null]
13[null]
14[null]
15[null]
16[null]
17[null]
18[null]
19[null]
20[null]
21[null]
22[null]
23[null]
24[null]
25[null]
26[null]
27[null]
28[null]
29[null]
30[null]
31[null]
32[null]
33[null]
34[null]
35[null]
36[null]
37[null]
38[null]
39[null]
40[null]
41[null]
42[null]
43[null]
44[null]
45[null]
46[null]
47[null]
48[null]
49[null]
50[null]
51[null]
52[null]
53[null]
54[null]
55[null]
5614
57[null]
58[null]
59[null]
60[null]
61[null]
62[null]
63[null]
64[null]
65[null]
66[null]
67[null]
68[null]
69[null]
70[null]
71[null]
72[null]
73[null]
74[null]
75[null]
76[null]
77[null]
78[null]
79[null]
80[null]
81[null]
82[null]
83[null]
84[null]
85[null]
86[null]
87[null]
88[null]
89[null]
90[null]
91[null]
92[null]
93[null]
94[null]
95[null]
96[null]
97[null]
98[null]
99[null]
100[null]
Rows: 1-100 of 1234 | Column: boat | Type: varchar(100)
795 element(s) was/were filled
123
boat
Bool
10
20
30
40
50
60
70
80
91
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
561
570
580
590
600
610
620
630
640
650
660
670
680
690
700
710
720
730
740
750
760
770
780
790
800
810
820
830
840
850
860
870
880
890
900
910
920
930
940
950
960
970
980
990
1000
Out[76]:
Rows: 1-100 of 1234 | Column: boat | Type: bool

See Also

vDataFrame[].dropna Drops the vcolumn missing values.