VerticaPy

Python API for Vertica Data Science at Scale

The 'Magic' Methods of the vDataFrame

VerticaPy 0.3.2 introduces the 'Magic' methods, which offer some additional flexilibility for mathematical operations in the vDataFrame. These methods let you handle many operations in a 'pandas-like' or Pythonic style.

In [11]:
from verticapy.datasets import load_titanic
titanic = load_titanic()
display(titanic)
123
pclass
Int
123
survived
Int
Abc
Varchar(164)
Abc
sex
Varchar(20)
123
age
Numeric(6,3)
123
sibsp
Int
123
parch
Int
Abc
ticket
Varchar(36)
123
fare
Numeric(10,5)
Abc
cabin
Varchar(30)
Abc
embarked
Varchar(20)
Abc
boat
Varchar(100)
123
body
Int
Abc
home.dest
Varchar(100)
110female2.012113781151.55C22 C26S[null][null]Montreal, PQ / Chesterville, ON
210male30.012113781151.55C22 C26S[null]135Montreal, PQ / Chesterville, ON
310female25.012113781151.55C22 C26S[null][null]Montreal, PQ / Chesterville, ON
410male39.0001120500.0A36S[null][null]Belfast, NI
510male71.000PC 1760949.5042[null]C[null]22Montevideo, Uruguay
610male47.010PC 17757227.525C62 C64C[null]124New York, NY
710male[null]00PC 1731825.925[null]S[null][null]New York, NY
810male24.001PC 17558247.5208B58 B60C[null][null]Montreal, PQ
910male36.0001305075.2417C6CA[null]Winnipeg, MN
1010male25.0001390526.0[null]C[null]148San Francisco, CA
1110male45.00011378435.5TS[null][null]Trenton, NJ
1210male42.00011048926.55D22S[null][null]London / Winnipeg, MB
1310male41.00011305430.5A21S[null][null]Pomeroy, WA
1410male48.000PC 1759150.4958B10C[null]208Omaha, NE
1510male[null]0011237939.6[null]C[null][null]Philadelphia, PA
1610male45.00011305026.55B38S[null][null]Washington, DC
1710male[null]0011379831.0[null]S[null][null][null]
1810male33.0006955.0B51 B53 B55S[null][null]New York, NY
1910male28.00011305947.1[null]S[null][null]Montevideo, Uruguay
2010male17.00011305947.1[null]S[null][null]Montevideo, Uruguay
2110male49.0001992426.0[null]S[null][null]Ascot, Berkshire / Rochester, NY
2210male36.0101987778.85C46S[null]172Little Onn Hall, Staffs
2310male46.010W.E.P. 573461.175E31S[null][null]Amenia, ND
2410male[null]001120510.0[null]S[null][null]Liverpool, England / Belfast
2510male27.01013508136.7792C89C[null][null]Los Angeles, CA
2610male[null]0011046552.0A14S[null][null]Stoughton, MA
2710male47.000572725.5875E58S[null][null]Victoria, BC
2810male37.011PC 1775683.1583E52C[null][null]Lakewood, NJ
2910male[null]0011379126.55[null]S[null][null]Roachdale, IN
3010male70.011WE/P 573571.0B22S[null]269Milwaukee, WI
3110male39.010PC 1759971.2833C85C[null][null]New York, NY
3210male31.010F.C. 1275052.0B71S[null][null]Montreal, PQ
3310male50.010PC 17761106.425C86C[null]62Deephaven, MN / Cedar Rapids, IA
3410male39.000PC 1758029.7A18C[null]133Philadelphia, PA
3510female36.000PC 1753131.6792A29C[null][null]New York, NY
3610male[null]00PC 17483221.7792C95S[null][null][null]
3710male30.00011305127.75C111C[null][null]New York, NY
3810male19.03219950263.0C23 C25 C27S[null][null]Winnipeg, MB
3910male64.01419950263.0C23 C25 C27S[null][null]Winnipeg, MB
4010male[null]0011377826.55D34S[null][null]Westcliff-on-Sea, Essex
4110male[null]001120580.0B102S[null][null][null]
4210male37.01011380353.1C123S[null][null]Scituate, MA
4310male47.00011132038.5E63S[null]275St Anne's-on-Sea, Lancashire
4410male24.000PC 1759379.2B86C[null][null][null]
4510male71.000PC 1775434.6542A5C[null][null]New York, NY
4610male38.001PC 17582153.4625C91S[null]147Winnipeg, MB
4710male46.000PC 1759379.2B82 B84C[null][null]New York, NY
4810male[null]0011379642.4[null]S[null][null][null]
4910male45.0103697383.475C83S[null][null]New York, NY
5010male40.0001120590.0B94S[null]110[null]
5110male55.0111274993.5B69S[null]307Montreal, PQ
5210male42.00011303842.5B11S[null][null]London / Middlesex
5310male[null]001746351.8625E46S[null][null]Brighton, MA
5410male55.00068050.0C39S[null][null]London / Birmingham
5510male42.01011378952.0[null]S[null]38New York, NY
5610male[null]00PC 1760030.6958[null]C14[null]New York, NY
5710female50.000PC 1759528.7125C49C[null][null]Paris, France New York, NY
5810male46.00069426.0[null]S[null]80Bennington, VT
5910male50.00011304426.0E60S[null][null]London
6010male32.500113503211.5C132C[null]45[null]
6110male58.0001177129.7B37C[null]258Buffalo, NY
6210male41.0101746451.8625D21S[null][null]Southington / Noank, CT
6310male[null]0011302826.55C124S[null][null]Portland, OR
6410male[null]00PC 1761227.7208[null]C[null][null]Chicago, IL
6510male29.00011350130.0D6S[null]126Springfield, MA
6610male30.00011380145.5[null]S[null][null]London / New York, NY
6710male30.00011046926.0C106S[null][null]Brockton, MA
6810male19.01011377353.1D30S[null][null]New York, NY
6910male46.0001305075.2417C6C[null]292Vancouver, BC
7010male54.0001746351.8625E46S[null]175Dorchester, MA
7110male28.010PC 1760482.1708[null]C[null][null]New York, NY
7210male65.0001350926.55E38S[null]249East Bridgewater, MA
7310male44.0201992890.0C78Q[null]230Fond du Lac, WI
7410male55.00011378730.5C30S[null][null]Montreal, PQ
7510male47.00011379642.4[null]S[null][null]Washington, DC
7610male37.001PC 1759629.7C118C[null][null]Brooklyn, NY
7710male58.00235273113.275D48C[null]122Lexington, MA
7810male64.00069326.0[null]S[null]263Isle of Wight, England
7910male65.00111350961.9792B30C[null]234Providence, RI
8010male28.500PC 1756227.7208D43C[null]189?Havana, Cuba
8110male[null]001120520.0[null]S[null][null]Belfast
8210male45.50011304328.5C124S[null]166Surbiton Hill, Surrey
8310male23.0001274993.5B24S[null][null]Montreal, PQ
8410male29.01011377666.6C2S[null][null]Isleworth, England
8510male18.010PC 17758108.9C65C[null][null]Madrid, Spain
8610male47.00011046552.0C110S[null]207Worcester, MA
8710male38.000199720.0[null]S[null][null]Rotterdam, Netherlands
8810male22.000PC 17760135.6333[null]C[null]232[null]
8910male[null]00PC 17757227.525[null]C[null][null][null]
9010male31.000PC 1759050.4958A24S[null][null]Trenton, NJ
9110male[null]0011376750.0A32S[null][null]Seattle, WA
9210male36.0001304940.125A10C[null][null]Winnipeg, MB
9310male55.010PC 1760359.4[null]C[null][null]New York, NY
9410male33.00011379026.55[null]S[null]109London
9510male61.013PC 17608262.375B57 B59 B63 B66C[null][null]Haverford, PA / Cooperstown, NY
9610male50.0101350755.9E44S[null][null]Duluth, MN
9710male56.00011379226.55[null]S[null][null]New York, NY
9810male56.0001776430.6958A7C[null][null]St James, Long Island, NY
9910male24.0101369560.0C31S[null][null]Huntington, WV
10010male[null]0011305626.0A19S[null][null]Streatham, Surrey
Rows: 1-100 | Columns: 14

Feature Engineering, 'pandas'-style

You can create new features with in a 'pandas' style.

In [12]:
titanic["family_size"] = titanic["parch"] + titanic["sibsp"] + 1
titanic[["sibsp", "parch", "family_size"]]
Out[12]:
123
sibsp
Int
123
parch
Int
123
family_size
Int
1124
2124
3124
4001
5001
6102
7001
8012
9001
10001
11001
12001
13001
14001
15001
16001
17001
18001
19001
20001
21001
22102
23102
24001
25102
26001
27001
28113
29001
30113
31102
32102
33102
34001
35001
36001
37001
38326
39146
40001
41001
42102
43001
44001
45001
46012
47001
48001
49102
50001
51113
52001
53001
54001
55102
56001
57001
58001
59001
60001
61001
62102
63001
64001
65001
66001
67001
68102
69001
70001
71102
72001
73203
74001
75001
76012
77023
78001
79012
80001
81001
82001
83001
84102
85102
86001
87001
88001
89001
90001
91001
92001
93102
94001
95135
96102
97001
98001
99102
100001
Rows: 1-100 | Columns: 3

You can also create features from various mathematical functions.

In [3]:
import verticapy.stats as st
titanic["ln_fare"] = st.ln(titanic["fare"])
titanic[["fare", "ln_fare"]]
Out[3]:
123
fare
Numeric(10,5)
123
ln_fare
Float
1151.555.02091560350381
2151.555.02091560350381
3151.555.02091560350381
40.0-inf
549.50423.90205751446006
6227.5255.42726012246342
725.9253.25520775411559
8247.52085.51149461888159
975.24174.32070559854228
1026.03.25809653802148
1135.53.56953269648137
1226.553.27902974768795
1330.53.41772668361337
1450.49583.92189016450581
1539.63.67882911826043
1626.553.27902974768795
1731.03.43398720448515
185.01.6094379124341
1947.13.85227300102237
2047.13.85227300102237
2126.03.25809653802148
2278.854.36754731340905
2361.1754.11373860932874
240.0-inf
25136.77924.91836794684454
2652.03.95124371858143
2725.58753.24210395098741
2883.15834.42074602026042
2926.553.27902974768795
3071.04.26267987704132
3171.28334.26666207838721
3252.03.95124371858143
33106.4254.66744051171422
3429.73.39114704580865
3531.67923.45566031410212
36221.77925.40168229234028
3727.753.32323584019244
38263.05.57215403217776
39263.05.57215403217776
4026.553.27902974768795
410.0-inf
4253.13.97217692824789
4338.53.65065824129374
4479.24.37197629882038
4534.65423.54541893041153
46153.46255.03345623750696
4779.24.37197629882038
4842.43.74714836223791
4983.4754.42454718582972
500.0-inf
5193.54.53796143629464
5242.53.74950407593037
5351.86253.94859598565899
5450.03.91202300542815
5552.03.95124371858143
5630.69583.42412583741652
5728.71253.35733256801522
5826.03.25809653802148
5926.03.25809653802148
60211.55.35422499848633
6129.73.39114704580865
6251.86253.94859598565899
6326.553.27902974768795
6427.72083.32218303393412
6530.03.40119738166216
6645.53.8177123259569
6726.03.25809653802148
6853.13.97217692824789
6975.24174.32070559854228
7051.86253.94859598565899
7182.17084.40880000780905
7226.553.27902974768795
7390.04.49980967033027
7430.53.41772668361337
7542.43.74714836223791
7629.73.39114704580865
77113.2754.7298184905532
7826.03.25809653802148
7961.97924.12679884488682
8027.72083.32218303393412
810.0-inf
8228.53.3499040872746
8393.54.53796143629464
8466.64.19870457754634
85108.94.69043002993891
8652.03.95124371858143
870.0-inf
88135.63334.90995492057888
89227.5255.42726012246342
9050.49583.92189016450581
9150.03.91202300542815
9240.1253.69199958145018
9359.44.0842942263686
9426.553.27902974768795
95262.3755.5697747781408
9655.94.02356438016105
9726.553.27902974768795
9830.69583.42412583741652
9960.04.0943445622221
10026.03.25809653802148
Rows: 1-100 | Columns: 2
In [4]:
titanic["x"] = 1 - st.exp(-titanic["fare"])
titanic[["fare", "x"]]
Out[4]:
123
fare
Numeric(10,5)
123
x
Float
1151.551.0
2151.551.0
3151.551.0
40.00.0
549.50421.0
6227.5251.0
725.9250.999999999994493
8247.52081.0
975.24171.0
1026.00.999999999994891
1135.51.0
1226.550.999999999997052
1330.50.999999999999943
1450.49581.0
1539.61.0
1626.550.999999999997052
1731.00.999999999999966
185.00.993262053000915
1947.11.0
2047.11.0
2126.00.999999999994891
2278.851.0
2361.1751.0
240.00.0
25136.77921.0
2652.01.0
2725.58750.999999999992282
2883.15831.0
2926.550.999999999997052
3071.01.0
3171.28331.0
3252.01.0
33106.4251.0
3429.70.999999999999874
3531.67920.999999999999983
36221.77921.0
3727.750.999999999999112
38263.01.0
39263.01.0
4026.550.999999999997052
410.00.0
4253.11.0
4338.51.0
4479.21.0
4534.65420.999999999999999
46153.46251.0
4779.21.0
4842.41.0
4983.4751.0
500.00.0
5193.51.0
5242.51.0
5351.86251.0
5450.01.0
5552.01.0
5630.69580.999999999999953
5728.71250.999999999999661
5826.00.999999999994891
5926.00.999999999994891
60211.51.0
6129.70.999999999999874
6251.86251.0
6326.550.999999999997052
6427.72080.999999999999086
6530.00.999999999999906
6645.51.0
6726.00.999999999994891
6853.11.0
6975.24171.0
7051.86251.0
7182.17081.0
7226.550.999999999997052
7390.01.0
7430.50.999999999999943
7542.41.0
7629.70.999999999999874
77113.2751.0
7826.00.999999999994891
7961.97921.0
8027.72080.999999999999086
810.00.0
8228.50.999999999999581
8393.51.0
8466.61.0
85108.91.0
8652.01.0
870.00.0
88135.63331.0
89227.5251.0
9050.49581.0
9150.01.0
9240.1251.0
9359.41.0
9426.550.999999999997052
95262.3751.0
9655.91.0
9726.550.999999999997052
9830.69580.999999999999953
9960.01.0
10026.00.999999999994891
Rows: 1-100 | Columns: 2

Conditional Operators

You can now filter your data with conditional operators like and ('&'), or ('|'), equals ('=='), not equals (!=), and more!

Equal Operator (==)

In [13]:
# Identifies the passengers who came alone

single_family = titanic[titanic["family_size"] == 1]
single_family
Out[13]:
123
pclass
Int
123
survived
Int
Abc
Varchar(164)
Abc
sex
Varchar(20)
123
age
Numeric(6,3)
123
sibsp
Int
123
parch
Int
Abc
ticket
Varchar(36)
123
fare
Numeric(10,5)
Abc
cabin
Varchar(30)
Abc
embarked
Varchar(20)
Abc
boat
Varchar(100)
123
body
Int
Abc
Varchar(100)
123
family_size
Int
110male39.0001120500.0A36S[null][null]1
210male71.000PC 1760949.5042[null]C[null]221
310male[null]00PC 1731825.925[null]S[null][null]1
410male36.0001305075.2417C6CA[null]1
510male25.0001390526.0[null]C[null]1481
610male45.00011378435.5TS[null][null]1
710male42.00011048926.55D22S[null][null]1
810male41.00011305430.5A21S[null][null]1
910male48.000PC 1759150.4958B10C[null]2081
1010male[null]0011237939.6[null]C[null][null]1
1110male45.00011305026.55B38S[null][null]1
1210male[null]0011379831.0[null]S[null][null]1
1310male33.0006955.0B51 B53 B55S[null][null]1
1410male28.00011305947.1[null]S[null][null]1
1510male17.00011305947.1[null]S[null][null]1
1610male49.0001992426.0[null]S[null][null]1
1710male[null]001120510.0