VerticaPy

Python API for Vertica Data Science at Scale

The Virtual DataFrame

The Virtual DataFrame is the main object of the library. It is the perfect transition from small to Big Data. The principle is quite simple. As Vertica is a powerful columnar MPP DataBase with many built-in function, Vertica is a veritable asset to do heavy computations.

Indeed, columnar orientation allows high compression and it will also avoid useless parsing to retrieve the data. The MPP (Massive Parallel Processing) allows to parallelize all the computations accross the different nodes.

Instead of working in memory where many limitations avoid to work properly on huge volumes of data, VerticaPy takes the advantage of Vertica scalability to push all the heavy computations to the platform which will aggregate and return the final result.



Thanks to Python flexibility, it is then easy to add many abstractions to help users simplifying the entire process. The vDataFrame is an abstraction which will help any user to enjoy doing the Data Science cycle entirely inside the Vertica DataBase.

To understand the object, there is nothing better than examples.

vDataFrame Creation

There are 2 main ways to create a vDataFrame.

The first one is to directly create a vDataFrame using an existing relation.

In [30]:
from verticapy import vDataFrame
from verticapy import vertica_conn

cur = vertica_conn("VerticaDSN").cursor()

vDataFrame("public.titanic", cur)
123
fare
Numeric(10,5)
123
survived
Int
Abc
sex
Varchar(20)
Abc
boat
Varchar(100)
123
pclass
Int
123
age
Numeric(6,3)
Abc
ticket
Varchar(36)
Abc
Varchar(164)
Abc
embarked
Varchar(20)
Abc
cabin
Varchar(30)
123
body
Int
123
parch
Int
Abc
home.dest
Varchar(100)
123
sibsp
Int
1151.550000female[null]12.000113781SC22 C26[null]2Montreal, PQ / Chesterville, ON1
2151.550000male[null]130.000113781SC22 C261352Montreal, PQ / Chesterville, ON1
3151.550000female[null]125.000113781SC22 C26[null]2Montreal, PQ / Chesterville, ON1
40.000000male[null]139.000112050SA36[null]0Belfast, NI0
549.504200male[null]171.000PC 17609C[null]220Montevideo, Uruguay0
6227.525000male[null]147.000PC 17757CC62 C641240New York, NY1
725.925000male[null]1[null]PC 17318S[null][null]0New York, NY0
8247.520800male[null]124.000PC 17558CB58 B60[null]1Montreal, PQ0
975.241700maleA136.00013050CC6[null]0Winnipeg, MN0
1026.000000male[null]125.00013905C[null]1480San Francisco, CA0
1135.500000male[null]145.000113784ST[null]0Trenton, NJ0
1226.550000male[null]142.000110489SD22[null]0London / Winnipeg, MB0
1330.500000male[null]141.000113054SA21[null]0Pomeroy, WA0
1450.495800male[null]148.000PC 17591CB102080Omaha, NE0
1539.600000male[null]1[null]112379C[null][null]0Philadelphia, PA0
1626.550000male[null]145.000113050SB38[null]0Washington, DC0
1731.000000male[null]1[null]113798S[null][null]0[null]0
185.000000male[null]133.000695SB51 B53 B55[null]0New York, NY0
1947.100000male[null]128.000113059S[null][null]0Montevideo, Uruguay0
2047.100000male[null]117.000113059S[null][null]0Montevideo, Uruguay0
2126.000000male[null]149.00019924S[null][null]0Ascot, Berkshire / Rochester, NY0
2278.850000male[null]136.00019877SC461720Little Onn Hall, Staffs1
2361.175000male[null]146.000W.E.P. 5734SE31[null]0Amenia, ND1
240.000000male[null]1[null]112051S[null][null]0Liverpool, England / Belfast0
25136.779200male[null]127.00013508CC89[null]0Los Angeles, CA1
2652.000000male[null]1[null]110465SA14[null]0Stoughton, MA0
2725.587500male[null]147.0005727SE58[null]0Victoria, BC0
2883.158300male[null]137.000PC 17756CE52[null]1Lakewood, NJ1
2926.550000male[null]1[null]113791S[null][null]0Roachdale, IN0
3071.000000male[null]170.000WE/P 5735SB222691Milwaukee, WI1
3171.283300male[null]139.000PC 17599CC85[null]0New York, NY1
3252.000000male[null]131.000F.C. 12750SB71[null]0Montreal, PQ1
33106.425000male[null]150.000PC 17761CC86620Deephaven, MN / Cedar Rapids, IA1
3429.700000male[null]139.000PC 17580CA181330Philadelphia, PA0
3531.679200female[null]136.000PC 17531CA29[null]0New York, NY0
36221.779200male[null]1[null]PC 17483SC95[null]0[null]0
3727.750000male[null]130.000113051CC111[null]0New York, NY0
38263.000000male[null]119.00019950SC23 C25 C27[null]2Winnipeg, MB3
39263.000000male[null]164.00019950SC23 C25 C27[null]4Winnipeg, MB1
4026.550000male[null]1[null]113778SD34[null]0Westcliff-on-Sea, Essex0
410.000000male[null]1[null]112058SB102[null]0[null]0
4253.100000male[null]137.000113803SC123[null]0Scituate, MA1
4338.500000male[null]147.000111320SE632750St Anne's-on-Sea, Lancashire0
4479.200000male[null]124.000PC 17593CB86[null]0[null]0
4534.654200male[null]171.000PC 17754CA5[null]0New York, NY0
46153.462500male[null]138.000PC 17582SC911471Winnipeg, MB0
4779.200000male[null]146.000PC 17593CB82 B84[null]0New York, NY0
4842.400000male[null]1[null]113796S[null][null]0[null]0
4983.475000male[null]145.00036973SC83[null]0New York, NY1
500.000000male[null]140.000112059SB941100[null]0
5193.500000male[null]155.00012749SB693071Montreal, PQ1
5242.500000male[null]142.000113038SB11[null]0London / Middlesex0
5351.862500male[null]1[null]17463SE46[null]0Brighton, MA0
5450.000000male[null]155.000680SC39[null]0London / Birmingham0
5552.000000male[null]142.000113789S[null]380New York, NY1
5630.695800male141[null]PC 17600C[null][null]0New York, NY0
5728.712500female[null]150.000PC 17595CC49[null]0Paris, France New York, NY0
5826.000000male[null]146.000694S[null]800Bennington, VT0
5926.000000male[null]150.000113044SE60[null]0London0
60211.500000male[null]132.500113503CC132450[null]0
6129.700000male[null]158.00011771CB372580Buffalo, NY0
6251.862500male[null]141.00017464SD21[null]0Southington / Noank, CT1
6326.550000male[null]1[null]113028SC124[null]0Portland, OR0
6427.720800male[null]1[null]PC 17612C[null][null]0Chicago, IL0
6530.000000male[null]129.000113501SD61260Springfield, MA0
6645.500000male[null]130.000113801S[null][null]0London / New York, NY0
6726.000000male[null]130.000110469SC106[null]0Brockton, MA0
6853.100000male[null]119.000113773SD30[null]0New York, NY1
6975.241700male[null]146.00013050CC62920Vancouver, BC0
7051.862500male[null]154.00017463SE461750Dorchester, MA0
7182.170800male[null]128.000PC 17604C[null][null]0New York, NY1
7226.550000male[null]165.00013509SE382490East Bridgewater, MA0
7390.000000male[null]144.00019928QC782300Fond du Lac, WI2
7430.500000male[null]155.000113787SC30[null]0Montreal, PQ0
7542.400000male[null]147.000113796S[null][null]0Washington, DC0
7629.700000male[null]137.000PC 17596CC118[null]1Brooklyn, NY0
77113.275000male[null]158.00035273CD481222Lexington, MA0
7826.000000male[null]164.000693S[null]2630Isle of Wight, England0
7961.979200male[null]165.000113509CB302341Providence, RI0
8027.720800male[null]128.500PC 17562CD431890?Havana, Cuba0
810.000000male[null]1[null]112052S[null][null]0Belfast0
8228.500000male[null]145.500113043SC1241660Surbiton Hill, Surrey0
8393.500000male[null]123.00012749SB24[null]0Montreal, PQ0
8466.600000male[null]129.000113776SC2[null]0Isleworth, England1
85108.900000male[null]118.000PC 17758CC65[null]0Madrid, Spain1
8652.000000male[null]147.000110465SC1102070Worcester, MA0
870.000000male[null]138.00019972S[null][null]0Rotterdam, Netherlands0
88135.633300male[null]122.000PC 17760C[null]2320[null]0
89227.525000male[null]1[null]PC 17757C[null][null]0[null]0
9050.495800male[null]131.000PC 17590SA24[null]0Trenton, NJ0
9150.000000male[null]1[null]113767SA32[null]0Seattle, WA0
9240.125000male[null]136.00013049CA10[null]0Winnipeg, MB0
9359.400000male[null]155.000PC 17603C[null][null]0New York, NY1
9426.550000male[null]133.000113790S[null]1090London0
95262.375000male[null]161.000PC 17608CB57 B59 B63 B66[null]3Haverford, PA / Cooperstown, NY1
9655.900000male[null]150.00013507SE44[null]0Duluth, MN1
9726.550000male[null]156.000113792S[null][null]0New York, NY0
9830.695800male[null]156.00017764CA7[null]0St James, Long Island, NY0
9960.000000male[null]124.00013695SC31[null]0Huntington, WV1
10026.000000male[null]1[null]113056SA19[null]0Streatham, Surrey0
Out[30]:
Rows: 1-100 of 1234 | Columns: 14

It is also possible to create a vDataFrame using a customized relation.

In [31]:
from verticapy import vdf_from_relation
vdf_from_relation("(SELECT pclass, AVG(survived) AS survived FROM titanic GROUP BY 1) x", 
                  cursor = cur)
123
pclass
Int
123
survived
Float
110.612179487179487
220.416988416988417
330.227752639517345
Out[31]:
Rows: 3 | Columns: 2

In-Memory vs In-DB

To understand the main difference between in-memory and in-DB, let's create a vDataFrame using an existing relation.

In [32]:
import time

start_time = time.time()
expedia = vDataFrame("public.expedia", cur)
display(expedia)
print("elapsed time = {}".format(time.time() - start_time))
123
is_mobile
Int
123
srch_adults_cnt
Int
123
cnt
Int
123
user_location_city
Int
123
user_location_country
Int
123
srch_destination_type_id
Int
123
is_package
Int
123
srch_rm_cnt
Int
123
posa_continent
Int
📅
srch_ci
Date
📅
date_time
Timestamp
123
hotel_market
Int
123
user_id
Int
123
hotel_country
Int
📅
srch_co
Date
123
srch_children_cnt
Int
123
channel
Int
123
is_booking
Int
123
orig_destination_distance
Numeric(11,5)
123
srch_destination_id
Int
123
hotel_continent
Int
123
site_name
Int
123
user_location_region
Int
123
hotel_cluster
Int
1022479976630132013-02-152013-02-12 18:36:5111169190182013-02-160001595.3932088254234852
202148686610132013-02-212013-02-12 18:36:52674829692502013-02-23090198.74790148462231468
303156136060222013-03-052013-02-12 18:36:5249677031442013-03-091503908.87200840762231729
40214699310122013-03-162013-02-12 18:36:5223211734461262013-03-20090[null]82823246036
5041307976610132013-05-102013-02-12 18:36:53411513707502013-05-120901253.9069082662218925
60215076120510132013-05-062013-02-12 18:36:54291237821052013-05-090305018.33370874661141181
7131514646660132013-03-272013-02-12 18:36:5518545760331052013-04-011204893.21680202556233122
8023403546630132013-03-132013-02-12 18:36:561122853883502013-03-160901324.8262010812222691
91242125920550132013-03-022013-02-12 18:36:58682826612502013-03-040901085.94160126542235477
10111487746611132013-04-122013-02-12 18:37:016631063495502013-04-17090[null]82812218970
1103228896610132013-03-222013-02-12 18:37:021007222851462013-03-29000[null]224074217461
1202224286661132013-03-032013-02-12 18:37:041509533703512013-03-070901562.49710122574219626
13025481276660132013-02-232013-02-12 18:37:0411085720582013-03-010301644.52560210594240417
140332531520511132013-09-122013-02-12 18:37:04592487061402013-09-180906166.905101191703435492
15021312066660132013-02-132013-02-12 18:37:081230615724502013-02-1401023.13970122682217470
16025331256660132013-05-042013-02-12 18:37:10697634299502013-05-05000[null]258192236350
17021196776610132013-02-162013-02-12 18:37:16428978113502013-02-17090173.65790432222615
1812154446660132013-03-132013-02-12 18:37:18350166780502013-03-14131422.00270121782222046
1902138883111112013-03-282013-02-12 18:37:1819239117702013-04-040001220.58810825362318917
20021117406610132013-02-142013-02-12 18:37:22592850320502013-02-15091991.83190188112218491
21045236596611232013-05-182013-02-12 18:37:2311087362582013-05-230001223.8704087914232265
22021145667010122013-03-192013-02-12 18:37:231033219481992013-03-28090[null]2683354776
23042168356610232013-02-172013-02-12 18:37:25699720088502013-02-18130[null]113532239272
24021506819111112013-05-022013-02-12 18:37:256751188445502013-05-08010[null]82672284570
250214079330122013-09-142013-02-12 18:37:292121166859502013-09-15090[null]6692244997
26022312066660132013-02-132013-02-12 18:37:301230615724502013-02-1401023.27920122682217498
270457373360222013-03-062013-02-12 18:37:303651179245502013-03-13090[null]122162247914
28023348496610232013-03-012013-02-12 18:37:31276008812042013-03-100903750.2992087456236375
2902120442310122013-02-232013-02-12 18:37:3125511620131062013-02-24011[null]418863245036
30021309586610132013-04-152013-02-12 18:37:374053954321742013-04-220901688.21870116344234844
3104120866660232013-03-222013-02-12 18:37:3914571137395502013-03-26290130.31530285142222070
32023348496610232013-03-012013-02-12 18:37:39276008812042013-03-100903751.9445087456236322
33021880620510132013-02-132013-02-12 18:37:403971710681982013-02-1509038.96120828721135441
3412154446660132013-03-132013-02-12 18:37:43350166780502013-03-14130422.00270121782222046
35021444706611132013-03-302013-02-12 18:37:4468296496502013-04-05210[null]82682231485
36011374496660132013-02-212013-02-12 18:37:454201126467502013-02-22201143.20390246282234891
37026254436660132013-07-092013-02-12 18:37:462784215391352013-07-101903646.5785083376234840
380223169310132013-05-292013-02-12 18:37:476751034717502013-06-01090[null]826722643
39021249266911112013-02-182013-02-12 18:37:4715281157471942013-02-22011[null]3365153756767
40021272686610132013-09-252013-02-12 18:37:49354130631052013-09-280904197.1405087856233764
41025146566610132013-05-012013-02-12 18:37:5014659071712042013-05-020904737.55600189106222062
42021388016610132013-02-162013-02-12 18:37:5010561170201502013-02-17090148.02900549202226016
4312198216611132013-04-042013-02-12 18:37:53671153848502013-04-08000756.7478088432217492
440223814015430132013-02-132013-02-12 18:37:5620578262241192013-02-14090[null]117114219682
450112531520560132013-02-202013-02-12 18:37:56637339033502013-02-23000435.337201219121135472
460313814015411132013-02-262013-02-12 18:37:582328271111262013-03-02001[null]82823219659
4702432796611132013-05-312013-02-12 18:38:00191346460502013-06-07130890.8562082912232268
48131514646660132013-03-272013-02-12 18:38:0418545760331052013-04-011204892.85670202556233158
49125263376611132013-06-132013-02-12 18:38:081108533682013-06-172901005.7701087914244252
5002220866660132013-03-222013-02-12 18:38:0914571137395502013-03-26290130.31530285142222070
51013369886610132013-02-212013-02-12 18:38:0944985222502013-02-22090315.35740244922244794
520114699310122013-02-122013-02-12 18:38:1323211594861262013-02-13001[null]82823246036
53013561534640112013-02-162013-02-12 18:38:14627993963502013-02-230905730.00880287032371726
54021473576610132013-02-152013-02-12 18:38:15643432297502013-02-17000179.4054082442225894
55041307976610132013-05-102013-02-12 18:38:17411513707502013-05-120911253.9069082662218925
5603120866660132013-03-222013-02-12 18:38:1714571137395502013-03-26290130.31530285142222070
57021390586610132013-04-022013-02-12 18:38:18150017396002013-04-061315984.0639082473235164
58022196776610132013-02-162013-02-12 18:38:20428978113502013-02-17090173.5365043222267
5902335176610132013-02-162013-02-12 18:38:21132866021502013-02-1700014.694402212225491
60021238466630132013-03-292013-02-12 18:38:23191675848502013-03-31090212.8159045722233191
610211332020540132013-04-142013-02-12 18:38:253974991631982013-04-15090247.893803099123435459
62021388016610132013-02-162013-02-12 18:38:2610561170201502013-02-17090147.91390549202226042
63021189826611132013-03-132013-02-12 18:38:29628823404502013-03-160001748.930908250222261
640211179746611132013-08-012013-02-12 18:38:29628769230502013-08-040302245.4558082502234879
65031114096660132013-03-012013-02-12 18:38:33648157089502013-03-03110359.79180237922218459
660244079330122013-09-142013-02-12 18:38:332121166859502013-09-15090[null]6692244992
670352531520511132013-09-192013-02-12 18:38:36592487061402013-09-250906165.925101191703435492
68022488626611132013-05-242013-02-12 18:38:41150827717472013-05-310901560.0150088644234852
69023272686610132013-09-252013-02-12 18:38:42354130631052013-09-280904196.8477087856233786
700414232819410122013-03-172013-02-12 18:38:43663693891502013-03-20100[null]82812333818
7112127936131112013-11-182013-02-12 18:38:48734427131522013-11-270904822.72100106032315111
72011156276660132013-02-132013-02-12 18:38:50703711733502013-02-14091[null]285852222091
73024316746610132013-03-062013-02-12 18:38:54682684330502013-03-14090853.6661082682244254
74021473576610132013-02-152013-02-12 18:38:56643432297502013-02-17000185.2033082442225891
75