VerticaPy

Python API for Vertica Data Science at Scale

The Virtual DataFrame¶

The Virtual DataFrame is the main object and star of the library and acts as the perfect transition between small and Big Data. The principle is quite simple: As Vertica is a powerful columnar massive parallel processing (MPP) database with many built-in functions, we want it to do as much of the computation work as possible.

Indeed, columnar orientation allows for high compression, and its structure inherently avoids unncessary parsing when retrieving data. MPP allows to parallelize our computations accross the different nodes.

The best way to take advantage of your data is by simply keeping it in your Vertica database, rather than within the limitations of working memory. VerticaPy pushes all computation to your Vertica database before aggregating the final result, so you can get the best of both worlds: Vertica's power and Python's flexibility.

With Python, it's easy to add abstractions, and the vDataFrame acts as the primary abstraction layer. Simple but powerful, it'll help any user through the data science life cycle.

Creating the vDataFrame¶

There are two main ways to create a vDataFrame.

The first is to create one directly using an existing relation.

In [1]:

from verticapy.datasets import load_titanic
load_titanic() # Loading the titanic dataset in Vertica

import verticapy as vp
vp.vDataFrame("public.titanic")

Out[1]:

	123 pclass Int	123 survived Int	Abc Varchar(164)	Abc sex Varchar(20)	123 age Numeric(6,3)	123 sibsp Int	123 parch Int	Abc ticket Varchar(36)	123 fare Numeric(10,5)	Abc cabin Varchar(30)	Abc embarked Varchar(20)	Abc boat Varchar(100)	123 body Int	Abc home.dest Varchar(100)
1	1	0		female	2.0	1	2	113781	151.55	C22 C26	S	[null]	[null]	Montreal, PQ / Chesterville, ON
2	1	0		male	30.0	1	2	113781	151.55	C22 C26	S	[null]	135	Montreal, PQ / Chesterville, ON
3	1	0		female	25.0	1	2	113781	151.55	C22 C26	S	[null]	[null]	Montreal, PQ / Chesterville, ON
4	1	0		male	39.0	0	0	112050	0.0	A36	S	[null]	[null]	Belfast, NI
5	1	0		male	71.0	0	0	PC 17609	49.5042	[null]	C	[null]	22	Montevideo, Uruguay
6	1	0		male	47.0	1	0	PC 17757	227.525	C62 C64	C	[null]	124	New York, NY
7	1	0		male	[null]	0	0	PC 17318	25.925	[null]	S	[null]	[null]	New York, NY
8	1	0		male	24.0	0	1	PC 17558	247.5208	B58 B60	C	[null]	[null]	Montreal, PQ
9	1	0		male	36.0	0	0	13050	75.2417	C6	C	A	[null]	Winnipeg, MN
10	1	0		male	25.0	0	0	13905	26.0	[null]	C	[null]	148	San Francisco, CA
11	1	0		male	45.0	0	0	113784	35.5	T	S	[null]	[null]	Trenton, NJ
12	1	0		male	42.0	0	0	110489	26.55	D22	S	[null]	[null]	London / Winnipeg, MB
13	1	0		male	41.0	0	0	113054	30.5	A21	S	[null]	[null]	Pomeroy, WA
14	1	0		male	48.0	0	0	PC 17591	50.4958	B10	C	[null]	208	Omaha, NE
15	1	0		male	[null]	0	0	112379	39.6	[null]	C	[null]	[null]	Philadelphia, PA
16	1	0		male	45.0	0	0	113050	26.55	B38	S	[null]	[null]	Washington, DC
17	1	0		male	[null]	0	0	113798	31.0	[null]	S	[null]	[null]	[null]
18	1	0		male	33.0	0	0	695	5.0	B51 B53 B55	S	[null]	[null]	New York, NY
19	1	0		male	28.0	0	0	113059	47.1	[null]	S	[null]	[null]	Montevideo, Uruguay
20	1	0		male	17.0	0	0	113059	47.1	[null]	S	[null]	[null]	Montevideo, Uruguay
21	1	0		male	49.0	0	0	19924	26.0	[null]	S	[null]	[null]	Ascot, Berkshire / Rochester, NY
22	1	0		male	36.0	1	0	19877	78.85	C46	S	[null]	172	Little Onn Hall, Staffs
23	1	0		male	46.0	1	0	W.E.P. 5734	61.175	E31	S	[null]	[null]	Amenia, ND
24	1	0		male	[null]	0	0	112051	0.0	[null]	S	[null]	[null]	Liverpool, England / Belfast
25	1	0		male	27.0	1	0	13508	136.7792	C89	C	[null]	[null]	Los Angeles, CA
26	1	0		male	[null]	0	0	110465	52.0	A14	S	[null]	[null]	Stoughton, MA
27	1	0		male	47.0	0	0	5727	25.5875	E58	S	[null]	[null]	Victoria, BC
28	1	0		male	37.0	1	1	PC 17756	83.1583	E52	C	[null]	[null]	Lakewood, NJ
29	1	0		male	[null]	0	0	113791	26.55	[null]	S	[null]	[null]	Roachdale, IN
30	1	0		male	70.0	1	1	WE/P 5735	71.0	B22	S	[null]	269	Milwaukee, WI
31	1	0		male	39.0	1	0	PC 17599	71.2833	C85	C	[null]	[null]	New York, NY
32	1	0		male	31.0	1	0	F.C. 12750	52.0	B71	S	[null]	[null]	Montreal, PQ
33	1	0		male	50.0	1	0	PC 17761	106.425	C86	C	[null]	62	Deephaven, MN / Cedar Rapids, IA
34	1	0		male	39.0	0	0	PC 17580	29.7	A18	C	[null]	133	Philadelphia, PA
35	1	0		female	36.0	0	0	PC 17531	31.6792	A29	C	[null]	[null]	New York, NY
36	1	0		male	[null]	0	0	PC 17483	221.7792	C95	S	[null]	[null]	[null]
37	1	0		male	30.0	0	0	113051	27.75	C111	C	[null]	[null]	New York, NY
38	1	0		male	19.0	3	2	19950	263.0	C23 C25 C27	S	[null]	[null]	Winnipeg, MB
39	1	0		male	64.0	1	4	19950	263.0	C23 C25 C27	S	[null]	[null]	Winnipeg, MB
40	1	0		male	[null]	0	0	113778	26.55	D34	S	[null]	[null]	Westcliff-on-Sea, Essex
41	1	0		male	[null]	0	0	112058	0.0	B102	S	[null]	[null]	[null]
42	1	0		male	37.0	1	0	113803	53.1	C123	S	[null]	[null]	Scituate, MA
43	1	0		male	47.0	0	0	111320	38.5	E63	S	[null]	275	St Anne's-on-Sea, Lancashire
44	1	0		male	24.0	0	0	PC 17593	79.2	B86	C	[null]	[null]	[null]
45	1	0		male	71.0	0	0	PC 17754	34.6542	A5	C	[null]	[null]	New York, NY
46	1	0		male	38.0	0	1	PC 17582	153.4625	C91	S	[null]	147	Winnipeg, MB
47	1	0		male	46.0	0	0	PC 17593	79.2	B82 B84	C	[null]	[null]	New York, NY
48	1	0		male	[null]	0	0	113796	42.4	[null]	S	[null]	[null]	[null]
49	1	0		male	45.0	1	0	36973	83.475	C83	S	[null]	[null]	New York, NY
50	1	0		male	40.0	0	0	112059	0.0	B94	S	[null]	110	[null]
51	1	0		male	55.0	1	1	12749	93.5	B69	S	[null]	307	Montreal, PQ
52	1	0		male	42.0	0	0	113038	42.5	B11	S	[null]	[null]	London / Middlesex
53	1	0		male	[null]	0	0	17463	51.8625	E46	S	[null]	[null]	Brighton, MA
54	1	0		male	55.0	0	0	680	50.0	C39	S	[null]	[null]	London / Birmingham
55	1	0		male	42.0	1	0	113789	52.0	[null]	S	[null]	38	New York, NY
56	1	0		male	[null]	0	0	PC 17600	30.6958	[null]	C	14	[null]	New York, NY
57	1	0		female	50.0	0	0	PC 17595	28.7125	C49	C	[null]	[null]	Paris, France New York, NY
58	1	0		male	46.0	0	0	694	26.0	[null]	S	[null]	80	Bennington, VT
59	1	0		male	50.0	0	0	113044	26.0	E60	S	[null]	[null]	London
60	1	0		male	32.5	0	0	113503	211.5	C132	C	[null]	45	[null]
61	1	0		male	58.0	0	0	11771	29.7	B37	C	[null]	258	Buffalo, NY
62	1	0		male	41.0	1	0	17464	51.8625	D21	S	[null]	[null]	Southington / Noank, CT
63	1	0		male	[null]	0	0	113028	26.55	C124	S	[null]	[null]	Portland, OR
64	1	0		male	[null]	0	0	PC 17612	27.7208	[null]	C	[null]	[null]	Chicago, IL
65	1	0		male	29.0	0	0	113501	30.0	D6	S	[null]	126	Springfield, MA
66	1	0		male	30.0	0	0	113801	45.5	[null]	S	[null]	[null]	London / New York, NY
67	1	0		male	30.0	0	0	110469	26.0	C106	S	[null]	[null]	Brockton, MA
68	1	0		male	19.0	1	0	113773	53.1	D30	S	[null]	[null]	New York, NY
69	1	0		male	46.0	0	0	13050	75.2417	C6	C	[null]	292	Vancouver, BC
70	1	0		male	54.0	0	0	17463	51.8625	E46	S	[null]	175	Dorchester, MA
71	1	0		male	28.0	1	0	PC 17604	82.1708	[null]	C	[null]	[null]	New York, NY
72	1	0		male	65.0	0	0	13509	26.55	E38	S	[null]	249	East Bridgewater, MA
73	1	0		male	44.0	2	0	19928	90.0	C78	Q	[null]	230	Fond du Lac, WI
74	1	0		male	55.0	0	0	113787	30.5	C30	S	[null]	[null]	Montreal, PQ
75	1	0		male	47.0	0	0	113796	42.4	[null]	S	[null]	[null]	Washington, DC
76	1	0		male	37.0	0	1	PC 17596	29.7	C118	C	[null]	[null]	Brooklyn, NY
77	1	0		male	58.0	0	2	35273	113.275	D48	C	[null]	122	Lexington, MA
78	1	0		male	64.0	0	0	693	26.0	[null]	S	[null]	263	Isle of Wight, England
79	1	0		male	65.0	0	1	113509	61.9792	B30	C	[null]	234	Providence, RI
80	1	0		male	28.5	0	0	PC 17562	27.7208	D43	C	[null]	189	?Havana, Cuba
81	1	0		male	[null]	0	0	112052	0.0	[null]	S	[null]	[null]	Belfast
82	1	0		male	45.5	0	0	113043	28.5	C124	S	[null]	166	Surbiton Hill, Surrey
83	1	0		male	23.0	0	0	12749	93.5	B24	S	[null]	[null]	Montreal, PQ
84	1	0		male	29.0	1	0	113776	66.6	C2	S	[null]	[null]	Isleworth, England
85	1	0		male	18.0	1	0	PC 17758	108.9	C65	C	[null]	[null]	Madrid, Spain
86	1	0		male	47.0	0	0	110465	52.0	C110	S	[null]	207	Worcester, MA
87	1	0		male	38.0	0	0	19972	0.0	[null]	S	[null]	[null]	Rotterdam, Netherlands
88	1	0		male	22.0	0	0	PC 17760	135.6333	[null]	C	[null]	232	[null]
89	1	0		male	[null]	0	0	PC 17757	227.525	[null]	C	[null]	[null]	[null]
90	1	0		male	31.0	0	0	PC 17590	50.4958	A24	S	[null]	[null]	Trenton, NJ
91	1	0		male	[null]	0	0	113767	50.0	A32	S	[null]	[null]	Seattle, WA
92	1	0		male	36.0	0	0	13049	40.125	A10	C	[null]	[null]	Winnipeg, MB
93	1	0		male	55.0	1	0	PC 17603	59.4	[null]	C	[null]	[null]	New York, NY
94	1	0		male	33.0	0	0	113790	26.55	[null]	S	[null]	109	London
95	1	0		male	61.0	1	3	PC 17608	262.375	B57 B59 B63 B66	C	[null]	[null]	Haverford, PA / Cooperstown, NY
96	1	0		male	50.0	1	0	13507	55.9	E44	S	[null]	[null]	Duluth, MN
97	1	0		male	56.0	0	0	113792	26.55	[null]	S	[null]	[null]	New York, NY
98	1	0		male	56.0	0	0	17764	30.6958	A7	C	[null]	[null]	St James, Long Island, NY
99	1	0		male	24.0	1	0	13695	60.0	C31	S	[null]	[null]	Huntington, WV
100	1	0		male	[null]	0	0	113056	26.0	A19	S	[null]	[null]	Streatham, Surrey

Rows: 1-100 | Columns: 14

We can also create one using a customized relation.

In [2]:

vp.vDataFrame(sql = "SELECT pclass, AVG(survived) AS survived FROM titanic GROUP BY 1")

Out[2]:

	123 pclass Integer	123 survived Float
1	1	0.612179487179487
2	2	0.416988416988417
3	3	0.227752639517345

Rows: 1-3 | Columns: 2

In-memory vs. In-database Loading and Processing¶

First, let's load the expedia dataset in Vertica.

In [4]:

vp.read_csv("data/expedia.csv", schema = "public", parse_nrows = 2000)

Out[4]:

	📅 date_time Timestamp	123 site_name Int	123 posa_continent Int	123 user_location_country Int	123 user_location_region Int	123 user_location_city Int	123 orig_destination_distance Numeric(11,5)	123 user_id Int	123 is_mobile Int	123 is_package Int	123 channel Int	📅 srch_ci Date	📅 srch_co Date	123 srch_adults_cnt Int	123 srch_children_cnt Int	123 srch_rm_cnt Int	123 srch_destination_id Int	123 srch_destination_type_id Int	123 is_booking Int	123 cnt Int	123 hotel_continent Int	123 hotel_country Int	123 hotel_market Int	123 hotel_cluster Int
1	2013-01-07 00:00:02	24	2	3	50	5703	[null]	461899	0	0	9	2013-03-14	2013-03-15	2	1	1	669	3	0	1	2	50	212	41
2	2013-01-07 00:00:06	2	3	66	174	21177	5713.6206	13796	0	0	9	2013-01-19	2013-01-26	1	0	1	8821	1	0	3	6	17	30	58
3	2013-01-07 00:00:06	11	3	205	155	14703	795.7298	1128575	0	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	0	1	2	50	1230	91
4	2013-01-07 00:00:09	37	1	69	761	41949	[null]	1080476	0	1	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	10
5	2013-01-07 00:00:17	37	1	69	761	41949	[null]	1018895	0	0	0	2013-09-08	2013-09-10	2	0	1	27215	6	0	2	2	50	645	59
6	2013-01-07 00:00:17	37	1	69	761	41949	[null]	1080476	0	1	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	10
7	2013-01-07 00:00:18	2	3	66	462	18767	2696.5003	783725	1	0	9	2013-07-29	2013-08-04	3	3	1	8855	1	0	2	2	50	213	48
8	2013-01-07 00:00:23	37	1	69	761	41949	[null]	1080476	0	1	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	10
9	2013-01-07 00:00:28	2	3	66	294	7976	511.012	1197968	0	0	9	2013-01-16	2013-01-18	1	0	1	5811	3	0	2	2	50	350	51
10	2013-01-07 00:00:28	2	3	66	356	4779	4948.1861	593375	0	0	9	2013-05-09	2013-05-10	4	0	2	2763	3	0	2	6	70	1901	2
11	2013-01-07 00:00:29	24	2	3	49	19105	[null]	1174819	0	0	9	2013-04-25	2013-04-28	2	0	1	14908	1	0	1	3	88	1032	75
12	2013-01-07 00:00:33	24	2	3	50	53819	[null]	233534	0	0	1	2013-01-08	2013-01-09	1	1	1	26729	6	0	1	6	105	12	48
13	2013-01-07 00:00:39	2	3	119	0	27731	[null]	519086	0	0	9	2013-01-09	2013-01-11	1	0	1	1510	3	0	3	3	99	1043	82
14	2013-01-07 00:00:42	2	3	163	12	13476	[null]	176709	0	0	9	2013-01-19	2013-01-20	2	0	1	23507	6	0	2	6	70	19	59
15	2013-01-07 00:00:48	24	2	3	49	19105	[null]	1174819	0	0	9	2013-04-25	2013-04-28	2	0	1	14908	1	0	3	3	88	1032	96
16	2013-01-07 00:00:50	2	3	66	442	19744	5315.4141	1173504	0	0	9	2013-01-20	2013-01-26	1	1	1	8739	1	1	1	6	144	4	2
17	2013-01-07 00:00:51	24	2	3	50	53819	[null]	233534	0	0	1	2013-01-08	2013-01-09	1	1	1	26729	6	0	2	6	105	12	61
18	2013-01-07 00:00:55	2	3	66	459	43805	1768.161	836947	0	1	9	2013-04-13	2013-04-18	3	2	1	12257	6	0	2	4	51	1509	5
19	2013-01-07 00:00:59	2	3	23	48	4924	[null]	1046558	0	0	9	2013-02-15	2013-02-17	1	0	1	8244	1	0	4	2	50	643	68
20	2013-01-07 00:00:59	2	3	66	174	16634	382.6203	886436	0	0	1	2013-02-15	2013-02-20	2	0	1	8250	1	0	1	2	50	628	45
21	2013-01-07 00:01:10	24	2	3	50	5703	[null]	207769	0	0	1	2013-04-29	2013-05-01	2	0	1	18569	1	0	2	3	182	83	26
22	2013-01-07 00:01:18	2	3	215	646	51733	149.1411	901573	1	0	9	2013-01-17	2013-01-20	2	0	1	22119	6	0	2	4	8	128	58
23	2013-01-07 00:01:34	2	3	66	356	4779	4948.1861	593375	0	0	9	2013-05-09	2013-05-10	2	0	1	2763	3	0	3	6	70	1901	2
24	2013-01-07 00:01:36	2	3	66	174	6735	2404.0244	1103572	0	0	3	2013-02-11	2013-02-13	2	1	1	25169	6	0	2	2	50	674	4
25	2013-01-07 00:01:45	2	3	66	294	7976	509.4848	1197968	0	0	9	2013-01-16	2013-01-18	1	0	1	5811	3	0	4	2	50	350	76
26	2013-01-07 00:01:53	24	2	3	49	19105	[null]	1174819	0	0	9	2013-04-25	2013-04-28	2	0	1	14908	1	0	3	3	88	1032	5
27	2013-01-07 00:02:02	2	3	66	462	14703	2605.0971	614322	0	0	1	2013-01-08	2013-01-15	2	0	1	13094	3	0	1	2	50	212	41
28	2013-01-07 00:02:02	34	3	205	155	41977	155.835	725753	0	0	5	2013-01-11	2013-01-12	2	0	1	8288	1	1	1	2	198	399	10
29	2013-01-07 00:02:20	2	3	215	646	51733	149.2343	901573	1	0	9	2013-01-17	2013-01-20	2	0	1	22119	6	0	1	4	8	128	36
30	2013-01-07 00:02:21	34	3	205	135	13892	118.0871	799663	0	0	5	2013-01-08	2013-01-10	3	1	1	25946	6	1	1	2	198	370	73
31	2013-01-07 00:02:24	11	3	205	155	14703	796.3281	1128575	0	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	0	1	2	50	1230	4
32	2013-01-07 00:02:28	13	1	46	347	1810	5555.7996	970396	0	0	9	2013-10-09	2013-10-16	2	0	1	8279	1	0	2	2	50	1230	70
33	2013-01-07 00:02:33	11	3	205	155	12291	54.3112	756870	0	0	0	2013-01-08	2013-01-09	2	1	1	26385	6	0	1	2	198	1131	7
34	2013-01-07 00:02:36	2	3	66	348	47997	1692.3911	856762	0	1	2	2013-03-24	2013-04-01	6	0	3	11634	1	0	1	4	35	1616	40
35	2013-01-07 00:02:43	2	3	66	462	14703	2605.4093	614322	0	0	1	2013-01-08	2013-01-15	2	0	1	13094	3	0	1	2	50	212	40
36	2013-01-07 00:02:43	24	2	66	346	31371	384.0182	1162059	1	0	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	99
37	2013-01-07 00:02:45	2	3	66	356	22202	145.5446	1150496	0	0	3	2013-01-18	2013-01-20	2	0	1	12271	6	0	1	2	50	663	91
38	2013-01-07 00:02:46	24	2	3	50	5703	[null]	5061	0	0	3	2013-01-21	2013-01-22	1	1	1	8746	1	0	1	6	105	29	10
39	2013-01-07 00:02:59	24	2	66	346	31371	386.4055	1162059	1	0	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	91
40	2013-01-07 00:03:09	2	3	66	174	42881	379.3402	80737	0	0	9	2013-03-20	2013-03-23	2	0	1	47429	3	0	1	2	50	1230	18
41	2013-01-07 00:03:19	2	3	163	12	13476	[null]	176709	0	0	9	2013-01-19	2013-01-20	2	0	1	23507	6	0	1	6	70	19	49
42	2013-01-07 00:03:23	24	2	3	50	5703	[null]	486357	0	0	0	[null]	[null]	2	0	1	8043	3	0	1	2	50	676	83
43	2013-01-07 00:03:27	17	1	133	25	1503	[null]	228687	0	1	9	2013-02-18	2013-02-23	2	2	2	8821	1	0	1	6	17	30	82
44	2013-01-07 00:03:33	2	3	215	646	51733	149.2495	901573	1	0	9	2013-01-17	2013-01-20	2	0	1	22119	6	0	1	4	8	128	67
45	2013-01-07 00:03:35	2	3	70	47	14566	[null]	598076	0	0	9	2013-01-12	2013-01-13	6	1	3	8220	1	0	2	3	182	46	46
46	2013-01-07 00:03:37	24	2	66	346	31371	383.6014	1162059	1	0	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	94
47	2013-01-07 00:03:41	2	3	66	174	21177	5713.6394	13796	0	0	9	2013-01-19	2013-01-26	1	0	1	8821	1	0	1	6	17	30	67
48	2013-01-07 00:03:52	11	3	205	155	12291	54.216	756870	0	0	0	2013-01-08	2013-01-09	2	1	1	1811	1	0	1	2	198	1131	89
49	2013-01-07 00:03:59	11	3	205	155	14703	796.3082	1128575	0	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	0	2	2	50	1230	5
50	2013-01-07 00:04:02	2	3	66	356	4779	4948.1861	593375	0	0	9	2013-05-09	2013-05-10	2	0	1	2763	3	1	2	6	70	1901	2
51	2013-01-07 00:04:14	2	3	66	348	48862	1093.7218	368084	0	0	3	2013-03-04	2013-03-08	4	0	1	44045	3	0	2	2	50	701	98
52	2013-01-07 00:04:17	24	2	3	50	5703	[null]	486357	0	0	0	2013-01-24	2013-01-28	1	0	1	8043	3	0	4	2	50	676	83
53	2013-01-07 00:04:24	23	1	1	167	14716	[null]	1002737	0	1	9	2013-01-17	2013-01-22	2	0	1	468	1	0	1	3	48	153	64
54	2013-01-07 00:04:25	13	1	46	347	1810	5557.073	970396	0	0	9	2013-10-09	2013-10-16	2	0	1	8279	1	0	1	2	50	1230	68
55	2013-01-07 00:04:29	2	3	66	174	42881	379.3402	80737	0	0	9	2013-03-20	2013-03-23	4	0	2	47429	3	0	1	2	50	1230	18
56	2013-01-07 00:04:31	2	3	66	348	48862	1093.7218	368084	0	0	3	2013-03-01	2013-03-05	4	0	1	44045	3	0	1	2	50	701	98
57	2013-01-07 00:04:32	11	3	205	155	53078	989.6525	564664	0	1	9	2013-03-17	2013-03-21	4	2	1	8250	1	0	5	2	50	628	45
58	2013-01-07 00:04:33	2	3	3	50	31800	[null]	192004	0	0	9	2013-02-24	2013-02-28	1	1	1	22238	6	0	2	6	77	2	6
59	2013-01-07 00:04:33	2	3	103	45	38784	[null]	753394	1	1	9	2013-02-09	2013-02-17	2	3	1	8268	1	0	1	2	50	682	31
60	2013-01-07 00:04:37	2	3	66	467	16159	1510.8714	103449	0	1	9	2013-03-01	2013-03-05	2	0	1	8791	1	1	1	4	8	110	65
61	2013-01-07 00:04:40	24	2	66	346	31371	390.7326	1162059	1	0	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	40
62	2013-01-07 00:04:42	2	3	66	348	48862	1093.7218	368084	0	0	3	2013-03-02	2013-03-06	4	0	1	44045	3	0	1	2	50	701	98
63	2013-01-07 00:04:44	34	3	205	155	14703	995.1954	332853	1	0	9	2013-03-24	2013-03-28	2	2	1	12206	6	0	3	2	50	628	79
64	2013-01-07 00:04:51	2	3	66	462	18767	2688.2625	783725	1	0	9	2013-07-29	2013-08-04	3	3	1	8855	1	0	1	2	50	213	6
65	2013-01-07 00:04:55	37	1	69	761	41949	[null]	976118	0	1	3	2013-02-15	2013-02-15	2	1	1	1014	1	0	1	3	92	168	92
66	2013-01-07 00:04:58	2	3	66	348	48862	1093.7218	368084	0	0	3	2013-03-04	2013-03-06	4	0	1	44045	3	0	3	2	50	701	98
67	2013-01-07 00:05:04	2	3	3	50	31800	[null]	192004	0	0	9	2013-02-24	2013-02-28	2	1	1	22238	6	0	1	6	77	2	6
68	2013-01-07 00:05:06	37	1	69	761	41949	[null]	1080476	0	1	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	83
69	2013-01-07 00:05:13	37	1	69	761	41949	[null]	1080476	0	1	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	55
70	2013-01-07 00:05:31	17	1	133	20	46168	[null]	124561	0	0	3	2013-03-09	2013-03-13	4	2	1	8253	1	0	2	6	70	19	48
71	2013-01-07 00:05:32	2	3	198	208	54488	[null]	425340	0	0	9	2013-02-10	2013-02-11	1	1	1	468	1	0	1	3	48	153	64
72	2013-01-07 00:05:34	24	2	3	50	22013	[null]	487265	0	0	0	2013-05-23	2013-05-27	4	1	1	669	3	0	1	2	50	212	96
73	2013-01-07 00:05:39	37	1	69	761	41949	[null]	1018895	0	0	0	2013-09-08	2013-09-10	2	0	1	27215	6	0	3	2	50	645	91
74	2013-01-07 00:05:42	2	3	66	348	48862	1093.7218	368084	0	0	3	2013-03-04	2013-03-07	4	0	1	44045	3	0	5	2	50	701	98
75	2013-01-07 00:05:47	24	2	66	174	26232	0.2863	212900	0	0	1	2013-01-06	2013-01-07	1	0	1	12269	6	0	1	2	50	1230	37
76	2013-01-07 00:05:56	11	3	205	155	12291	54.3112	756870	0	0	0	2013-01-08	2013-01-09	2	1	1	1811	1	0	5	2	198	1131	7
77	2013-01-07 00:05:57	24	2	3	48	8158	[null]	246177	0	0	0	2013-01-31	2013-02-05	2	0	1	1544	1	0	1	3	42	1229	38
78	2013-01-07 00:06:01	2	3	66	174	3622	2411.2046	58429	0	1	3	2013-07-03	2013-07-07	2	0	1	669	3	0	2	2	50	212	33
79	2013-01-07 00:06:07	2	3	66	174	21177	5712.9457	13796	0	0	9	2013-01-19	2013-01-26	1	0	1	8821	1	0	1	6	17	30	67
80	2013-01-07 00:06:11	2	3	66	174	27437	2439.5321	1115804	0	1	9	2013-02-02	2013-02-11	2	0	1	8855	1	0	1	2	50	213	0
81	2013-01-07 00:06:11	2	3	215	646	51733	149.4934	901573	1	0	9	2013-01-18	2013-01-20	2	0	1	22119	6	0	1	4	8	128	62
82	2013-01-07 00:06:23	2	3	66	142	17440	2299.5383	871663	0	0	9	2013-01-11	2013-01-12	1	0	1	8250	1	0	3	2	50	628	51
83	2013-01-07 00:06:33	24	2	3	50	22013	[null]	487265	0	0	0	2013-05-23	2013-05-27	4	1	1	669	3	0	1	2	50	212	83
84	2013-01-07 00:06:35	24	2	66	346	31371	384.2999	1162059	1	0	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	40
85	2013-01-07 00:06:40	11	3	205	385	50121	[null]	762423	0	0	9	2013-05-07	2013-05-12	2	1	1	8745	1	0	1	6	204	27	68
86	2013-01-07 00:06:42	2	3	23	48	4924	[null]	399892	1	0	1	2013-01-10	2013-01-13	2	0	1	12682	5	0	1	3	82	230	46
87	2013-01-07 00:06:51	2	3	66	174	3622	2411.0806	58429	0	1	3	2013-07-03	2013-07-07	2	0	1	669	3	0	1	2	50	212	83
88	2013-01-07 00:06:51	24	2	3	50	5703	[null]	18250	0	1	2	2013-03-31	2013-04-02	2	0	1	8282	1	0	1	3	126	232	57
89	2013-01-07 00:07:01	2	3	66	174	3622	2411.1283	58429	0	1	3	2013-07-03	2013-07-07	2	0	1	669	3	0	1	2	50	212	91
90	2013-01-07 00:07:02	2	3	66	189	17017	149.9512	204066	1	1	9	2013-02-01	2013-02-04	1	0	1	25800	6	0	2	2	50	696	95
91	2013-01-07 00:07:03	11	3	205	155	14703	796.3082	1128575	0	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	1	1	2	50	1230	5
92	2013-01-07 00:07:08	37	1	69	648	6514	[null]	1084815	0	1	9	2013-03-18	2013-04-01	2	0	1	8268	1	0	5	2	50	682	23
93	2013-01-07 00:07:12	24	2	3	48	40722	[null]	1181711	0	0	3	2013-02-10	2013-02-13	2	0	1	8819	1	0	3	3	168	1242	97
94	2013-01-07 00:07:17	34	3	205	155	14703	994.4014	332853	1	0	9	2013-03-24	2013-03-28	2	2	1	12206	6	0	5	2	50	628	1
95	2013-01-07 00:07:23	2	3	66	174	21177	5712.5501	13796	0	0	9	2013-01-19	2013-01-26	1	0	1	8821	1	0	3	6	17	30	43
96	2013-01-07 00:07:25	2	3	66	174	3622	2411.2683	58429	0	1	3	2013-07-03	2013-07-07	2	0	1	669	3	0	1	2	50	212	5
97	2013-01-07 00:07:29	17	1	133	25	1503	[null]	228687	0	1	9	2013-02-18	2013-02-23	2	2	2	8821	1	0	2	6	17	30	22
98	2013-01-07 00:07:30	24	2	3	49	10457	[null]	1194880	0	0	0	2013-01-30	2013-02-02	1	0	1	20225	6	0	5	3	182	46	5
99	2013-01-07 00:07:30	24	2	3	50	5703	[null]	486357	0	0	0	2013-01-24	2013-01-28	2	1	1	8043	3	0	1	2	50	676	18
100	2013-01-07 00:07:36	24	2	3	50	22013	[null]	487265	0	0	0	2013-05-23	2013-05-27	4	1	1	669	3	0	3	2	50	212	69

Rows: 1-100 | Columns: 24

To understand the main difference between loading data into memory and loading data into a Vertica database, let's create a vDataFrame using an existing relation.

In [5]:

import time
start_time = time.time()
expedia = vp.vDataFrame("public.expedia")
print("elapsed time = {}".format(time.time() - start_time))

elapsed time = 0.07027411460876465

It took less than a second to create a vDataFrame. This dataset comes in at 6GB, which is very expensive for a personal machine, so we store the data entirely in Vertica and nothing is loaded into memory.

Let's compare this to loading the data into memory with pandas. You can try to load the entire dataset in your computer if you have at least 8GB of memory.

In [6]:

import pandas as pd

L_nrows = [10000, 100000, 1000000, 2000000, 5000000, 10000000, 20000000]
L_time = []
for nrows in L_nrows:
    start_time = time.time()
    expedia_df = pd.read_csv("data/expedia.csv", nrows = nrows)
    elapsed_time = time.time() - start_time
    L_time.append(elapsed_time)
    print("nrows = {}; elapsed time = {}".format(nrows, elapsed_time))

nrows = 10000; elapsed time = 0.05047297477722168
nrows = 100000; elapsed time = 0.39075613021850586
nrows = 1000000; elapsed time = 4.192815780639648
nrows = 2000000; elapsed time = 8.545155763626099
nrows = 5000000; elapsed time = 25.91479206085205
nrows = 10000000; elapsed time = 62.76401090621948
nrows = 20000000; elapsed time = 136.46023797988892

Loading data into pandas is quite fast when the data volume is low (less than some MB), but as the size of the dataset increases, it can become exponentially more expensive.

In [7]:

import matplotlib.pyplot as plt
plt.plot(L_nrows, L_time)
plt.show()

Performance will also drastically decrease.

In [9]:

start_time = time.time()
expedia_df.corr()
print("elapsed time = 119.78299331665039".format(time.time() - start_time))

elapsed time = 119.78299331665039

We're only using a little bit more than half of the dataset and it took almost 2 minutes to compute the correlation matrix.

Let's compute the entire correlation matrix using the vDataFrame.

In [11]:

start_time = time.time()
expedia.corr(show = False)
print("elapsed time = 88.62625098228455".format(time.time() - start_time))

elapsed time = 88.62625098228455

It took almost 1 minute and 30 seconds on one single Community Edition Vertica cluster without using any Vertica-specific features that might increase performance, like creating projections and or compression features.

VerticaPy caches the computed aggregations. With this cache available, we can repeat the correlation matrix computation almost instantaneously.

In [12]:

start_time = time.time()
expedia.corr(show = False)
print("elapsed time = {}".format(time.time() - start_time))

elapsed time = 0.4171907901763916

If needed, the cache can be deactivated.

Let's look at the memory usage for less than half of the dataset: Pandas is taking more than 3.6 GB.

In [13]:

expedia_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000000 entries, 0 to 19999999
Data columns (total 24 columns):
 #   Column                     Dtype  
---  ------                     -----  
 0   date_time                  object 
 1   site_name                  int64  
 2   posa_continent             int64  
 3   user_location_country      int64  
 4   user_location_region       int64  
 5   user_location_city         int64  
 6   orig_destination_distance  float64
 7   user_id                    int64  
 8   is_mobile                  int64  
 9   is_package                 int64  
 10  channel                    int64  
 11  srch_ci                    object 
 12  srch_co                    object 
 13  srch_adults_cnt            int64  
 14  srch_children_cnt          int64  
 15  srch_rm_cnt                int64  
 16  srch_destination_id        int64  
 17  srch_destination_type_id   int64  
 18  is_booking                 int64  
 19  cnt                        int64  
 20  hotel_continent            int64  
 21  hotel_country              int64  
 22  hotel_market               int64  
 23  hotel_cluster              int64  
dtypes: float64(1), int64(20), object(3)
memory usage: 3.6+ GB

Let's compare that to the total memory usage of the vDataFrame; but instead of loading half the dataset, we'll load the entire dataset: less than 44KB!

The vDataFrame remembers the user's modifications to the data, but never loads the data itself into memory.

In [14]:

expedia.memory_usage()

Out[14]:

	value
object	692
"date_time"	1775
"site_name"	1775
"posa_continent"	1780
"user_location_country"	1787
"user_location_region"	1786
"user_location_city"	1784
"orig_destination_distance"	1791
"user_id"	1773
"is_mobile"	1775
"is_package"	1776
"channel"	1773
"srch_ci"	1773
"srch_co"	1773
"srch_adults_cnt"	1781
"srch_children_cnt"	1783
"srch_rm_cnt"	1777
"srch_destination_id"	1785
"srch_destination_type_id"	1790
"is_booking"	1776
"cnt"	1769
"hotel_continent"	1781
"hotel_country"	1779
"hotel_market"	1778
"hotel_cluster"	1779
total	43391

Rows: 1-26 | Columns: 2

We can see a clear difference. With VerticaPy, we can take advantage of Vertica's structure and scalability and run fast queries without ever loading the data into memory. In-memory processing is limited by many factors which lead to downsampling most of the time.

The Structure of the vDataFrame¶

A vDataFrame is composed of columns called vColumns. You can see these with the 'get_columns' method.

In [15]:

expedia.get_columns()

Out[15]:

['"date_time"',
 '"site_name"',
 '"posa_continent"',
 '"user_location_country"',
 '"user_location_region"',
 '"user_location_city"',
 '"orig_destination_distance"',
 '"user_id"',
 '"is_mobile"',
 '"is_package"',
 '"channel"',
 '"srch_ci"',
 '"srch_co"',
 '"srch_adults_cnt"',
 '"srch_children_cnt"',
 '"srch_rm_cnt"',
 '"srch_destination_id"',
 '"srch_destination_type_id"',
 '"is_booking"',
 '"cnt"',
 '"hotel_continent"',
 '"hotel_country"',
 '"hotel_market"',
 '"hotel_cluster"']

To access a vColumn, simply write its name between between square brackets.

In [16]:

expedia["is_booking"]

Out[16]:

	123 is_booking Integer
1	0
2	0
3	0
4	0
5	0
6	0
7	0
8	0
9	0
10	0
11	0
12	0
13	0
14	0
15	0
16	1
17	0
18	0
19	0
20	0
21	0
22	0
23	0
24	0
25	0
26	0
27	0
28	1
29	0
30	1
31	0
32	0
33	0
34	0
35	0
36	0
37	0
38	0
39	0
40	0
41	0
42	0
43	0
44	0
45	0
46	0
47	0
48	0
49	0
50	1
51	0
52	0
53	0
54	0
55	0
56	0
57	0
58	0
59	0
60	1
61	0
62	0
63	0
64	0
65	0
66	0
67	0
68	0
69	0
70	0
71	0
72	0
73	0
74	0
75	0
76	0
77	0
78	0
79	0
80	0
81	0
82	0
83	0
84	0
85	0
86	0
87	0
88	0
89	0
90	0
91	1
92	0
93	0
94	0
95	0
96	0
97	0
98	0
99	0
100	0

Rows: 1-100 of 37670293 | Column: is_booking | Type: Integer

VerticaPy is smart enough to not recompute an aggregation that it's already computed.

In [17]:

expedia["is_booking"].describe()

Out[17]:

	value
name	"is_booking"
dtype	int
unique	2.0
count	37670293.0
0	34669600
1	3000693

Rows: 1-6 | Columns: 2

Each vColumn has its own catalog to save user modifications.

For example, we previously computed some aggregations for the column 'is_booking'. Let's look at the catalog of the vColumn.

In [18]:

expedia["is_booking"].catalog

Out[18]:

{'approx_unique': 2,
 'biserial': {},
 'count': 37670293,
 'cov': {},
 'cramer': {},
 'kendall': {},
 'pearson': {'"channel"': 0.0244378703859355,
  '"cnt"': -0.112906513841061,
  '"hotel_cluster"': -0.021548065656294,
  '"hotel_continent"': -0.0261279829743103,
  '"hotel_country"': -0.00394808105853647,
  '"hotel_market"': 0.0119576115952104,
  '"is_booking"': 1.0,
  '"is_mobile"': -0.0307536571387009,
  '"is_package"': -0.0763467336220978,
  '"orig_destination_distance"': -0.037732062561249,
  '"posa_continent"': 0.00972436717006096,
  '"site_name"': -0.0103791330144224,
  '"srch_adults_cnt"': -0.0490450028206689,
  '"srch_children_cnt"': -0.0222719630493467,
  '"srch_destination_id"': 0.0249567593167076,
  '"srch_destination_type_id"': 0.0404728725370959,
  '"srch_rm_cnt"': 0.0108395233625253,
  '"user_id"': 0.00182006552587162,
  '"user_location_city"': 0.00222760096637636,
  '"user_location_country"': 0.00752614887257535,
  '"user_location_region"': 0.00635166380501374},
 'regr_avgx': {},
 'regr_avgy': {},
 'regr_count': {},
 'regr_intercept': {},
 'regr_r2': {},
 'regr_slope': {},
 'regr_sxx': {},
 'regr_sxy': {},
 'regr_syy': {},
 'spearman': {},
 'spearmand': {}}

It will save the most important aggregations to avoid recomputation. The catalog will be updated whenever we make major changes to our data.

We can also view the vDataFrame's backend SQL code generation by setting 'sql_on' with the 'set_option' function.

In [19]:

vp.set_option("sql_on", True)
expedia["cnt"].describe()

Computing the different aggregations.

SELECT
APPROXIMATE_COUNT_DISTINCT("cnt")
FROM
"public"."expedia" LIMIT 1

Computing the descriptive statistics of all numerical columns using SUMMARIZE_NUMCOL.

SELECT
SUMMARIZE_NUMCOL("cnt") OVER ()
FROM
"public"."expedia"

Out[19]:

	value
name	"cnt"
dtype	int
unique	103.0
count	37670293
mean	1.48338392271081
std	1.21977557865576
min	1.0
approx_25%	1.0
approx_50%	1.0
approx_75%	2.0
max	269.0

Rows: 1-11 | Columns: 2

You can also display the elapsed time of the different queries. For example, let's compute the correlation matrix of the vDataFrame.

Note: In order to display matplotlib graphics in Jupyter, you'll need to use the '%matplotlib inline' command the first time you decide to draw a graphic.

In [21]:

expedia = vp.vDataFrame("public.expedia") # creating a new vDataFrame to delete the catalog 
%matplotlib inline
expedia.corr()

Execution: 0.025s

Computing the pearson Corr Matrix.

SELECT
CORR_MATRIX("site_name", "posa_continent", "user_location_country", "user_location_region", "user_location_city", "orig_destination_distance", "user_id", "is_mobile", "is_package", "channel", "srch_adults_cnt", "srch_children_cnt", "srch_rm_cnt", "srch_destination_id", "srch_destination_type_id", "is_booking", "cnt", "hotel_continent", "hotel_country", "hotel_market", "hotel_cluster") OVER ()
FROM
"public"."expedia"

Execution: 103.668s

Out[21]:

	"site_name"	"posa_continent"	"user_location_country"	"user_location_region"	"user_location_city"	"orig_destination_distance"	"user_id"	"is_mobile"	"is_package"	"channel"	"srch_adults_cnt"	"srch_children_cnt"	"srch_rm_cnt"	"srch_destination_id"	"srch_destination_type_id"	"is_booking"	"cnt"	"hotel_continent"	"hotel_country"	"hotel_market"	"hotel_cluster"
"site_name"	1.0	-0.634816612143684	0.166413221265709	0.12821437947825	-0.01605396332706	0.0303247810846035	0.0240712207009264	-0.00710380583006283	0.0518134532878163	-0.0297380109487778	-0.009045122175538	-0.0322340837332296	0.0156611603271916	0.0290203455279926	-0.0106931925782493	-0.0103791330144224	0.0200283608650952	0.202756329228897	0.261030775499846	-0.0670971656075757	-0.0224084561384892
"posa_continent"	-0.634816612143684	1.0	0.175935856883486	-0.0285019132685485	0.0388217828947087	0.0480512845900701	-0.0111555188307958	0.0162988210265805	-0.09409691338264	0.0920325261301941	0.00799676588134841	0.0337225436643851	-0.0316002379055981	-0.0124363471593141	0.042590213009524	0.00972436717006096	-0.0148476525499434	-0.333604963254112	-0.156397995473871	0.0505709952109248	0.0149381746836798
"user_location_country"	0.166413221265709	0.175935856883486	1.0	0.0553635427466072	0.123494626778497	0.0477198072121744	-0.0229405814737616	0.00361121977714446	-0.0253207594305224	0.106040161537385	0.037285887045813	0.0369011876687484	-0.000211981947462354	0.00907517725389602	0.0305558416931122	0.00752614887257535	0.00276083147077441	-0.0672525617189614	0.0946489275032033	0.0182121272175058	-0.0104772970912144
"user_location_region"	0.12821437947825	-0.0285019132685485	0.0553635427466072	1.0	0.129756495559783	0.139158216738426	-0.00149767839331699	0.0166801197125434	0.0380588440014674	0.000603219505627745	0.00908413070471568	0.0123497070908388	0.000270986742297228	0.0221099168829771	0.00986028454613893	0.00635166380501374	-0.00888948697218562	0.0459288486415407	-0.054500346824397	0.0449874392733802	0.0074534610007365
"user_location_city"	-0.01605396332706	0.0388217828947087	0.123494626778497	0.129756495559783	1.0	0.0127803340681775	-0.00687345991291495	0.00144017170617916	0.0138646615691544	0.026494473904607	0.00819153335351679	0.00904155620877023	-6.816773064322e-05	0.00215866103728049	0.000508496421238839	0.00222760096637636	-0.00120134516704645	0.00535640996061662	-0.00939629447800558	0.00944641216534546	0.00083053274838471
"orig_destination_distance"	0.0303247810846035	0.0480512845900701	0.0477198072121744	0.139158216738426	0.0127803340681775	1.0	0.0155015141284054	-0.0549533449581566	0.0388640411784372	0.00295070722383832	-0.0240177422409684	-0.0613962725746934	-0.00759779409994088	-0.0273266777830876	-0.039254647380722	-0.037732062561249	0.00999638908310138	0.414828322601652	0.254986628696838	-0.0876818597149942	0.00726002985218034
"user_id"	0.0240712207009264	-0.0111555188307958	-0.0229405814737616	-0.00149767839331699	-0.00687345991291495	0.0155015141284054	1.0	-0.00574972633365601	-0.0118955887406791	-0.00204060626454281	-0.00453751724480532	-0.000221484334402738	0.00121788723071495	0.0030120030734974	0.00454890432245944	0.00182006552587162	-0.000380286713946227	0.00377501581636304	0.00949570839283134	-0.00409700719212358	0.00105157644341496
"is_mobile"	-0.00710380583006283	0.0162988210265805	0.00361121977714446	0.0166801197125434	0.00144017170617916	-0.0549533449581566	-0.00574972633365601	1.0	0.0541784707255951	-0.0326389742829066	0.0195870782275899	0.0204365745420044	-0.0223911112026215	-0.00808136825631841	-0.0179307464937316	-0.0307536571387009	0.00494003348150226	-0.0211207197358126	-0.0257175304716785	0.00782016280723261	0.00841153438654652
"is_package"	0.0518134532878163	-0.09409691338264	-0.0253207594305224	0.0380588440014674	0.0138646615691544	0.0388640411784372	-0.0118955887406791	0.0541784707255951	1.0	-0.0102508833763389	-0.0231097799709034	-0.0335062467550592	-0.0390976480309395	-0.14754569415511	-0.228205454125721	-0.0763467336220978	0.126928685634761	0.111825686970504	-0.0389092304457683	-0.0157801875479376	0.0387325456425916
"channel"	-0.0297380109487778	0.0920325261301941	0.106040161537385	0.000603219505627745	0.026494473904607	0.00295070722383832	-0.00204060626454281	-0.0326389742829066	-0.0102508833763389	1.0	-0.0225159738432285	0.0065329750998632	0.00684302450290117	0.00171863540883433	0.0271470544304324	0.0244378703859355	-0.0119804027217825	-0.0212967280428668	-0.00191842205984418	0.0050257330208459	0.000707020681183821
"srch_adults_cnt"	-0.009045122175538	0.00799676588134841	0.037285887045813	0.00908413070471568	0.00819153335351679	-0.0240177422409684	-0.00453751724480532	0.0195870782275899	-0.0231097799709034	-0.0225159738432285	1.0	0.109139981063006	0.511984118271784	0.00414866532620076	-0.010341811696591	-0.0490450028206689	0.0143207899461151	-0.0163055290691414	-0.0196808763974201	0.00638472352764088	0.0123092101846642
"srch_children_cnt"	-0.0322340837332296	0.0337225436643851	0.0369011876687484	0.0123497070908388	0.00904155620877023	-0.0613962725746934	-0.000221484334402738	0.0204365745420044	-0.0335062467550592	0.0065329750998632	0.109139981063006	1.0	0.0894186825993112	-0.00838418933162597	-0.00843304285144073	-0.0222719630493467	0.0178175541861587	-0.0580978049729952	-0.044189523179871	0.00189422985693525	0.0162605522871085
"srch_rm_cnt"	0.0156611603271916	-0.0316002379055981	-0.000211981947462354	0.000270986742297228	-6.816773064322e-05	-0.00759779409994088	0.00121788723071495	-0.0223911112026215	-0.0390976480309395	0.00684302450290117	0.511984118271784	0.0894186825993112	1.0	0.012377489247575	0.011211210059925	0.0108395233625253	-0.000191845831236549	0.0173347997670146	0.00831746161170434	-0.000250246328321126	-0.00595406145125926
"srch_destination_id"	0.0290203455279926	-0.0124363471593141	0.00907517725389602	0.0221099168829771	0.00215866103728049	-0.0273266777830876	0.0030120030734974	-0.00808136825631841	-0.14754569415511	0.00171863540883433	0.00414866532620076	-0.00838418933162597	0.012377489247575	1.0	0.437665075671356	0.0249567593167076	-0.0213127015782316	0.027696777309985	0.0538352901398296	0.0839537551580626	-0.0117117007471161
"srch_destination_type_id"	-0.0106931925782493	0.042590213009524	0.0305558416931122	0.00986028454613893	0.000508496421238839	-0.039254647380722	0.00454890432245944	-0.0179307464937316	-0.228205454125721	0.0271470544304324	-0.010341811696591	-0.00843304285144073	0.011211210059925	0.437665075671356	1.0	0.0404728725370959	-0.0255622234293765	-0.0378828342389403	-0.0223301467400504	0.0324633908894349	-0.0328496822620841
"is_booking"	-0.0103791330144224	0.00972436717006096	0.00752614887257535	0.00635166380501374	0.00222760096637636	-0.037732062561249	0.00182006552587162	-0.0307536571387009	-0.0763467336220978	0.0244378703859355	-0.0490450028206689	-0.0222719630493467	0.0108395233625253	0.0249567593167076	0.0404728725370959	1.0	-0.112906513841061	-0.0261279829743103	-0.00394808105853647	0.0119576115952104	-0.021548065656294
"cnt"	0.0200283608650952	-0.0148476525499434	0.00276083147077441	-0.00888948697218562	-0.00120134516704645	0.00999638908310138	-0.000380286713946227	0.00494003348150226	0.126928685634761	-0.0119804027217825	0.0143207899461151	0.0178175541861587	-0.000191845831236549	-0.0213127015782316	-0.0255622234293765	-0.112906513841061	1.0	0.0176651279788286	-0.00109795422989866	-0.0112453118280197	0.00294386623308423
"hotel_continent"	0.202756329228897	-0.333604963254112	-0.0672525617189614	0.0459288486415407	0.00535640996061662	0.414828322601652	0.00377501581636304	-0.0211207197358126	0.111825686970504	-0.0212967280428668	-0.0163055290691414	-0.0580978049729952	0.0173347997670146	0.027696777309985	-0.0378828342389403	-0.0261279829743103	0.0176651279788286	1.0	0.298871980714726	-0.0925976257775776	-0.0139632952967878
"hotel_country"	0.261030775499846	-0.156397995473871	0.0946489275032033	-0.054500346824397	-0.00939629447800558	0.254986628696838	0.00949570839283134	-0.0257175304716785	-0.0389092304457683	-0.00191842205984418	-0.0196808763974201	-0.044189523179871	0.00831746161170434	0.0538352901398296	-0.0223301467400504	-0.00394808105853647	-0.00109795422989866	0.298871980714726	1.0	0.0211167593299163	-0.0242885512464789
"hotel_market"	-0.0670971656075757	0.0505709952109248	0.0182121272175058	0.0449874392733802	0.00944641216534546	-0.0876818597149942	-0.00409700719212358	0.00782016280723261	-0.0157801875479376	0.0050257330208459	0.00638472352764088	0.00189422985693525	-0.000250246328321126	0.0839537551580626	0.0324633908894349	0.0119576115952104	-0.0112453118280197	-0.0925976257775776	0.0211167593299163	1.0	0.0342053589585268
"hotel_cluster"	-0.0224084561384892	0.0149381746836798	-0.0104772970912144	0.0074534610007365	0.00083053274838471	0.00726002985218034	0.00105157644341496	0.00841153438654652	0.0387325456425916	0.000707020681183821	0.0123092101846642	0.0162605522871085	-0.00595406145125926	-0.0117117007471161	-0.0328496822620841	-0.021548065656294	0.00294386623308423	-0.0139632952967878	-0.0242885512464789	0.0342053589585268	1.0

Rows: 1-21 | Columns: 22

All heavy computations are pushed to Vertica, and each aggregation is saved to each vColumn's catalog. If we call the 'corr' method again, it'll only take a couple seconds (time needed to draw the graphic).

In [22]:

start_time = time.time()
expedia.corr()
print("elapsed time = {}".format(time.time() - start_time))

elapsed time = 0.6716580390930176

We can turn off the different functions to display the elapsed time and the SQL code generation.

In [24]:

vp.set_option("sql_on", False)
vp.set_option("time_on", False)

You can access the current vDataFrame relation with the 'current_relation' method.

In [25]:

print(expedia.current_relation())

"public"."expedia"

Since we're working with SQL code generation, this relation will change according to the user's modifications. For example, let's impute the missing values of the vColumn 'orig_destination_distance' by its average and drop the vColumn 'is_package'.

In [26]:

expedia["orig_destination_distance"].fillna(method = "avg")
expedia["is_package"].drop()
print(expedia.current_relation())

13525001 elements were filled.
(
   SELECT
     "date_time",
     "site_name",
     "posa_continent",
     "user_location_country",
     "user_location_region",
     "user_location_city",
     COALESCE("orig_destination_distance", 1970.0900267207) AS "orig_destination_distance",
     "user_id",
     "is_mobile",
     "channel",
     "srch_ci",
     "srch_co",
     "srch_adults_cnt",
     "srch_children_cnt",
     "srch_rm_cnt",
     "srch_destination_id",
     "srch_destination_type_id",
     "is_booking",
     "cnt",
     "hotel_continent",
     "hotel_country",
     "hotel_market",
     "hotel_cluster" 
   FROM
 (
   SELECT
     "date_time",
     "site_name",
     "posa_continent",
     "user_location_country",
     "user_location_region",
     "user_location_city",
     "orig_destination_distance",
     "user_id",
     "is_mobile",
     "channel",
     "srch_ci",
     "srch_co",
     "srch_adults_cnt",
     "srch_children_cnt",
     "srch_rm_cnt",
     "srch_destination_id",
     "srch_destination_type_id",
     "is_booking",
     "cnt",
     "hotel_continent",
     "hotel_country",
     "hotel_market",
     "hotel_cluster" 
   FROM
 "public"."expedia") 
VERTICAPY_SUBTABLE) 
VERTICAPY_SUBTABLE

Notice how our dropping the vColumn 'is_package' simply removes it from the SELECT statement in our SQL query. Similarly, imputing a vColumn translates to using the 'COALESCE' SQL function.

vDataFrame Attributes and Management¶

As we saw, the vDataFrame has many attributes and methods. vDataFrames have two types of attributes:

Virtual Columns
Main Attributes (columns, main_relation ...)

The vDataFrame's main attributes are stored in the _VERTICAPYVARIABLES dictionary.

In [27]:

expedia._VERTICAPY_VARIABLES_

Out[27]:

{'allcols_ind': 24,
 'columns': ['"date_time"',
  '"site_name"',
  '"posa_continent"',
  '"user_location_country"',
  '"user_location_region"',
  '"user_location_city"',
  '"orig_destination_distance"',
  '"user_id"',
  '"is_mobile"',
  '"channel"',
  '"srch_ci"',
  '"srch_co"',
  '"srch_adults_cnt"',
  '"srch_children_cnt"',
  '"srch_rm_cnt"',
  '"srch_destination_id"',
  '"srch_destination_type_id"',
  '"is_booking"',
  '"cnt"',
  '"hotel_continent"',
  '"hotel_country"',
  '"hotel_market"',
  '"hotel_cluster"'],
 'count': 37670293,
 'exclude_columns': [],
 'history': ['{Wed May  4 20:35:03 2022} [Fillna]: 13525001 "orig_destination_distance" missing values were  filled.',
  '{Wed May  4 20:35:03 2022} [Drop]: vColumn "is_package" was deleted from the vDataFrame.'],
 'input_relation': 'expedia',
 'main_relation': '"public"."expedia"',
 'order_by': {},
 'saving': [],
 'schema': 'public',
 'where': []}

You should never change these attributes manually.

vDataFrame Data Types¶

The vDataFrame uses the data types of its vColumns. Computing an histogram for a numerical data type is not the same as computing one other for a categorical data type. The vDataFrame identifies four main categories:

int: integers are treated like categorical data types when their cardinality is low and will be considered numeric otherwise
float: numerics
date: date-like data types
text: categorical data types

Other data types may automatically treated as categorical. You can examine these different data types using the 'dtypes' method.

In [28]:

expedia.dtypes()

Out[28]:

	dtype
"date_time"	timestamp
"site_name"	int
"posa_continent"	int
"user_location_country"	int
"user_location_region"	int
"user_location_city"	int
"orig_destination_distance"	float
"user_id"	int
"is_mobile"	int
"channel"	int
"srch_ci"	date
"srch_co"	date
"srch_adults_cnt"	int
"srch_children_cnt"	int
"srch_rm_cnt"	int
"srch_destination_id"	int
"srch_destination_type_id"	int
"is_booking"	int
"cnt"	int
"hotel_continent"	int
"hotel_country"	int
"hotel_market"	int
"hotel_cluster"	int

Rows: 1-23 | Columns: 2

You can perform conversions with the 'astype' method.

In [29]:

expedia["hotel_market"].astype("varchar")
expedia["hotel_market"].ctype()

Out[29]:

'varchar'

You can also get the vColumn category using the 'category' method.

In [30]:

expedia["hotel_market"].category()

Out[30]:

'text'

Exporting / Saving / Loading a vDataFrame¶

The functions 'save' and 'load' allow the user to save and load their vDataFrame structure.

In [31]:

expedia.save()
expedia.filter("is_booking = 1")

34669600 elements were filtered.

Out[31]:

	📅 date_time Timestamp	123 site_name Int	123 posa_continent Int	123 user_location_country Int	123 user_location_region Int	123 user_location_city Int	123 orig_destination_distance Float	123 user_id Int	123 is_mobile Int	123 channel Int	📅 srch_ci Date	📅 srch_co Date	123 srch_adults_cnt Int	123 srch_children_cnt Int	123 srch_rm_cnt Int	123 srch_destination_id Int	123 srch_destination_type_id Int	123 is_booking Int	123 cnt Int	123 hotel_continent Int	123 hotel_country Int	Abc hotel_market Varchar	123 hotel_cluster Int
1	2013-01-07 00:00:50	2	3	66	442	19744	5315.4141	1173504	0	9	2013-01-20	2013-01-26	1	1	1	8739	1	1	1	6	144	4	2
2	2013-01-07 00:02:02	34	3	205	155	41977	155.835	725753	0	5	2013-01-11	2013-01-12	2	0	1	8288	1	1	1	2	198	399	10
3	2013-01-07 00:02:21	34	3	205	135	13892	118.0871	799663	0	5	2013-01-08	2013-01-10	3	1	1	25946	6	1	1	2	198	370	73
4	2013-01-07 00:04:02	2	3	66	356	4779	4948.1861	593375	0	9	2013-05-09	2013-05-10	2	0	1	2763	3	1	2	6	70	1901	2
5	2013-01-07 00:04:37	2	3	66	467	16159	1510.8714	103449	0	9	2013-03-01	2013-03-05	2	0	1	8791	1	1	1	4	8	110	65
6	2013-01-07 00:07:03	11	3	205	155	14703	796.3082	1128575	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	1	1	2	50	1230	5
7	2013-01-07 00:11:47	11	3	205	354	55763	1027.0272	715634	0	0	2013-02-17	2013-02-24	2	0	1	8268	1	1	1	2	50	682	10
8	2013-01-07 00:14:24	2	3	66	447	50736	1976.6836	1100037	0	9	2013-02-01	2013-02-03	1	0	1	44201	8	1	1	2	50	675	55
9	2013-01-07 00:16:01	24	2	3	50	5703	1970.0900267207	1153606	0	4	2013-05-02	2013-05-03	2	1	1	19779	6	1	1	2	50	212	84
10	2013-01-07 00:16:53	17	1	133	20	51634	1970.0900267207	759265	0	9	2013-02-01	2013-02-04	2	0	1	8740	1	1	1	6	105	12	43
11	2013-01-07 00:17:08	24	2	3	51	9527	1970.0900267207	1193156	0	4	2013-01-13	2013-01-14	1	0	1	4992	1	1	1	3	110	72	10
12	2013-01-07 00:18:10	13	1	63	480	38276	11447.1445	1012948	0	9	2013-02-08	2013-02-09	1	1	1	8217	1	1	1	0	34	1396	2
13	2013-01-07 00:18:59	24	2	3	50	53819	1970.0900267207	233534	0	1	2013-01-08	2013-01-09	1	1	1	26729	6	1	1	6	105	12	75
14	2013-01-07 00:23:30	24	2	3	50	5703	1970.0900267207	233286	1	1	2013-05-25	2013-05-27	1	0	1	8745	1	1	1	6	204	27	97
15	2013-01-07 00:28:05	2	3	66	337	14241	183.1422	409569	1	0	2013-01-13	2013-01-14	1	0	1	4060	3	1	1	2	50	698	15
16	2013-01-07 00:28:56	17	1	133	25	1503	1970.0900267207	228687	0	9	2013-02-18	2013-02-23	2	2	2	8821	1	1	1	6	17	30	22
17	2013-01-07 00:29:04	8	4	77	462	3492	10465.1087	1170845	0	1	2013-05-16	2013-05-20	1	0	1	26943	6	1	1	2	50	743	50
18	2013-01-07 00:29:35	2	3	66	246	3367	20.6946	1097089	0	9	2013-01-11	2013-01-12	1	0	1	14978	6	1	1	2	50	636	4
19	2013-01-07 00:29:56	13	1	85	464	13606	1970.0900267207	728889	0	4	2013-01-07	2013-01-08	2	0	1	1464	3	1	1	5	203	1461	30
20	2013-01-07 00:32:00	17	1	133	20	7325	1970.0900267207	940442	0	9	2013-01-17	2013-01-20	1	0	1	8823	1	1	1	3	0	1447	8
21	2013-01-07 00:35:17	24	2	3	50	5703	1970.0900267207	233286	1	1	2013-05-25	2013-05-27	1	0	1	8745	1	1	1	6	204	27	92
22	2013-01-07 00:36:17	24	2	231	68	42296	1970.0900267207	10153	1	9	2013-01-31	2013-02-01	1	0	1	8221	1	1	1	3	99	47	82
23	2013-01-07 00:39:56	2	3	149	290	33116	1970.0900267207	473260	0	9	2013-01-24	2013-01-26	2	0	1	23048	1	1	1	3	48	152	42
24	2013-01-07 00:41:02	24	2	3	51	9527	1970.0900267207	259991	0	1	2013-05-05	2013-05-06	1	0	1	23507	6	1	1	6	70	19	9
25	2013-01-07 00:41:37	2	3	119	0	27731	1970.0900267207	519086	0	9	2013-01-09	2013-01-11	1	0	1	1510	3	1	1	3	99	1043	82
26	2013-01-07 00:43:49	2	3	66	174	5938	2.6539	563376	0	9	2013-01-07	2013-01-09	2	0	1	8278	1	1	1	2	50	368	48
27	2013-01-07 00:46:19	2	3	66	311	37484	196.4225	334885	0	9	2013-01-07	2013-01-09	2	0	1	8267	1	1	1	2	50	675	55
28	2013-01-07 00:48:55	8	4	77	462	3492	10913.5446	1170845	0	1	2013-05-28	2013-05-30	1	0	1	12275	6	1	1	2	50	414	91
29	2013-01-07 00:49:31	2	3	66	348	5757	4823.9345	1104564	0	9	2013-02-03	2013-02-05	1	0	1	669	3	1	1	2	50	212	33
30	2013-01-07 00:52:12	24	2	3	50	5703	1970.0900267207	486892	0	3	2013-02-10	2013-02-12	2	1	1	8740	1	1	1	6	105	12	58
31	2013-01-07 00:54:06	2	3	0	356	37446	1970.0900267207	729835	0	9	2013-01-29	2013-02-01	2	0	1	8070	6	1	1	3	48	153	11
32	2013-01-07 00:59:45	37	1	69	668	22991	1970.0900267207	1189249	0	9	2013-02-11	2013-02-13	2	0	1	8745	1	1	1	6	204	27	11
33	2013-01-07 00:59:56	24	2	3	50	5703	1970.0900267207	486892	0	3	2013-02-08	2013-02-10	2	1	1	8746	1	1	1	6	105	29	43
34	2013-01-07 01:03:03	2	3	66	351	36508	4577.6528	85627	0	1	2013-02-15	2013-02-17	1	1	1	33390	6	1	1	6	105	12	43
35	2013-01-07 01:06:59	24	2	3	50	5703	1970.0900267207	486892	0	3	2013-02-13	2013-02-16	2	1	1	8788	1	1	1	6	77	2	9
36	2013-01-07 01:07:43	37	1	69	639	56293	1970.0900267207	1084371	0	0	2013-02-28	2013-03-01	2	0	1	47900	4	1	1	6	70	309	15
37	2013-01-07 01:07:50	8	4	77	462	3492	10524.1951	1170845	0	1	2013-06-01	2013-06-03	1	0	1	26601	6	1	1	2	50	550	28
38	2013-01-07 01:11:05	2	3	133	20	3392	1970.0900267207	1127970	0	8	2013-01-08	2013-01-10	2	1	1	8253	1	1	1	6	70	19	97
39	2013-01-07 01:14:19	2	3	35	177	33543	2650.4725	119414	0	9	2013-01-17	2013-01-19	1	0	1	18788	1	1	1	5	203	253	61
40	2013-01-07 01:17:04	2	3	69	923	48939	1970.0900267207	337690	0	9	2013-01-11	2013-01-13	1	0	1	12916	5	1	1	6	107	36	97
41	2013-01-07 01:18:47	24	2	3	45	35747	1970.0900267207	1166542	0	0	2013-01-14	2013-01-19	1	0	1	12233	6	1	1	2	50	675	55
42	2013-01-07 01:25:10	2	3	55	12	40448	1970.0900267207	627909	0	9	2013-01-10	2013-01-11	2	0	1	21660	1	1	1	6	105	1814	3
43	2013-01-07 01:25:56	24	2	3	50	5703	1970.0900267207	486892	0	3	2013-02-13	2013-02-16	2	1	1	8788	1	1	1	6	77	2	2
44	2013-01-07 01:26:21	24	2	167	51	11261	1970.0900267207	210009	0	9	2013-01-09	2013-01-12	6	0	3	11815	1	1	1	3	5	1701	30
45	2013-01-07 01:26:27	2	3	46	172	56153	0.248	889738	0	9	2013-01-07	2013-01-08	2	0	1	8743	1	1	1	6	144	24	59
46	2013-01-07 01:26:37	2	3	229	407	16373	1970.0900267207	217610	0	9	2013-01-08	2013-01-11	1	0	1	12014	1	1	1	2	50	644	15
47	2013-01-07 01:26:49	2	3	202	269	25293	1970.0900267207	913278	0	9	2013-01-09	2013-01-10	2	1	1	28610	6	1	1	6	22	245	5
48	2013-01-07 01:27:19	2	3	66	174	14752	342.9899	1043452	0	9	2013-03-01	2013-03-02	2	0	1	8279	1	1	1	2	50	1230	83
49	2013-01-07 01:28:33	2	3	75	144	52467	1970.0900267207	945993	0	9	2013-01-12	2013-01-17	2	0	1	468	1	1	1	3	48	153	15
50	2013-01-07 01:30:20	2	3	191	376	33581	1970.0900267207	325869	0	9	2013-01-08	2013-01-11	1	0	1	8797	1	1	1	6	144	1450	59
51	2013-01-07 01:31:53	24	2	3	50	5703	1970.0900267207	1155502	1	9	2013-02-06	2013-02-07	2	0	1	8741	1	1	1	6	144	13	82
52	2013-01-07 01:35:49	11	3	205	135	27655	171.416	174242	0	4	2013-01-07	2013-01-08	1	0	1	8228	1	1	1	2	198	371	39
53	2013-01-07 01:35:53	24	2	3	50	5703	1970.0900267207	487272	0	9	2013-01-29	2013-02-03	1	0	1	8220	1	1	1	3	182	46	15
54	2013-01-07 01:46:04	2	3	66	174	32628	6936.5268	526624	0	1	2013-08-18	2013-08-24	3	2	2	23731	1	1	1	6	22	301	20
55	2013-01-07 01:48:41	2	3	198	208	54488	1970.0900267207	848741	1	0	2013-01-07	2013-01-11	3	0	1	45687	6	1	1	3	48	153	46
56	2013-01-07 01:49:12	2	3	70	47	14566	1970.0900267207	756106	0	9	2013-01-11	2013-01-16	1	0	1	20225	6	1	1	3	182	46	3
57	2013-01-07 01:50:49	2	3	1	395	29254	819.1125	881839	0	9	2013-01-08	2013-01-09	2	0	1	8290	1	1	1	6	15	38	46
58	2013-01-07 01:55:10	2	3	66	174	24103	2299.5898	104938	0	9	2013-01-19	2013-01-23	4	0	1	8291	1	1	1	2	50	191	16
59	2013-01-07 01:55:36	24	2	3	50	5703	1970.0900267207	1181832	1	1	2013-02-10	2013-02-12	1	0	1	8282	1	1	1	3	126	232	15
60	2013-01-07 01:56:45	2	3	35	177	33543	2652.7368	119414	0	9	2013-01-19	2013-01-21	1	0	1	18788	1	1	1	5	203	253	64
61	2013-01-07 02:00:35	2	3	66	246	50661	2757.8656	283970	0	0	2013-05-26	2013-05-28	1	0	1	8250	1	1	1	2	50	628	88
62	2013-01-07 02:01:53	2	3	231	68	42296	1970.0900267207	167267	0	2	2013-03-10	2013-03-11	2	0	1	41329	1	1	1	3	104	41	46
63	2013-01-07 02:02:37	2	3	133	20	3392	1970.0900267207	167923	0	1	2013-01-24	2013-01-26	2	1	1	21588	6	1	1	6	144	24	9
64	2013-01-07 02:05:19	2	3	35	177	33543	1787.217	119414	0	9	2013-01-21	2013-01-22	1	0	1	8264	1	1	1	5	108	1568	67
65	2013-01-07 02:06:05	24	2	46	430	28602	110.2393	259068	0	9	2013-01-19	2013-01-20	3	0	1	20878	6	1	1	6	144	1353	82
66	2013-01-07 02:09:57	2	3	235	13	53632	1970.0900267207	627910	0	9	2013-01-17	2013-01-18	1	0	1	15655	5	1	1	3	1	142	38
67	2013-01-07 02:10:03	11	3	205	367	30789	822.9619	160582	0	0	2013-01-27	2013-01-29	2	0	1	26023	6	1	1	2	198	397	33
68	2013-01-07 02:11:37	2	3	229	249	56090	1970.0900267207	621828	0	9	2013-01-18	2013-01-21	3	0	2	1344	1	1	1	3	162	1710	63
69	2013-01-07 02:14:24	2	3	155	241	46001	1970.0900267207	71089	0	9	2013-01-14	2013-01-17	1	0	1	1086	1	1	1	5	43	803	20
70	2013-01-07 02:17:27	24	2	3	50	50547	1970.0900267207	1181832	0	1	2013-03-20	2013-03-21	1	0	1	8282	1	1	1	3	126	232	15
71	2013-01-07 02:22:03	2	3	66	174	18870	5581.1726	826240	0	9	2013-03-24	2013-03-30	2	2	1	8747	1	1	1	3	106	107	68
72	2013-01-07 02:25:34	2	3	1	318	2096	296.7651	111771	1	9	2013-01-12	2013-01-13	2	3	2	8746	1	1	1	6	105	29	38
73	2013-01-07 02:29:26	24	2	3	64	41790	1970.0900267207	1161551	0	1	2013-02-10	2013-02-12	1	0	1	8279	1	1	1	2	50	1230	23
74	2013-01-07 02:34:53	2	3	162	13	34170	1970.0900267207	439290	0	9	2013-01-26	2013-01-28	2	0	1	15202	5	1	1	5	194	1555	62
75	2013-01-07 02:35:10	24	2	3	50	5703	1970.0900267207	1166532	0	1	2013-01-21	2013-01-23	1	0	1	8822	1	1	1	3	130	91	29
76	2013-01-07 02:40:02	2	3	229	58	18633	1970.0900267207	822913	0	9	2013-01-08	2013-01-10	2	0	1	23436	1	1	1	3	162	1150	36
77	2013-01-07 02:49:05	2	3	66	174	46432	5359.5651	193021	0	9	2013-05-24	2013-05-27	1	2	1	43199	1	1	1	3	106	106	22
78	2013-01-07 02:53:05	2	3	70	47	14566	1970.0900267207	1157092	1	1	2013-01-08	2013-01-13	2	0	2	5401	6	1	1	3	182	83	57
79	2013-01-07 03:01:23	2	3	154	196	38140	1970.0900267207	283158	0	9	2013-01-07	2013-01-09	1	0	1	23076	1	1	1	4	25	1530	59
80	2013-01-07 03:02:02	34	3	205	354	6678	1363.9733	565041	0	9	2013-01-27	2013-01-31	1	0	1	11628	6	1	1	2	50	666	73
81	2013-01-07 03:04:58	2	3	62	135	43030	1970.0900267207	116070	0	9	2013-01-07	2013-01-08	1	0	1	8253	1	1	1	6	70	19	9
82	2013-01-07 03:06:43	11	3	205	385	45494	2368.0783	1136837	0	9	2013-03-29	2013-04-01	1	0	1	19002	3	1	1	4	25	1530	85
83	2013-01-07 03:13:05	2	3	48	424	5957	1970.0900267207	922172	0	3	2013-01-10	2013-01-11	1	0	1	1909	1	1	1	6	180	1525	36
84	2013-01-07 03:13:51	2	3	93	71	15362	1970.0900267207	171221	0	9	2013-01-07	2013-01-08	1	0	1	23756	6	1	1	6	22	1794	57
85	2013-01-07 03:14:24	2	3	46	172	56153	5794.7986	512802	0	9	2013-02-03	2013-02-07	3	0	2	8822	1	1	1	3	130	91	64
86	2013-01-07 03:17:06	2	3	1	395	29254	806.8612	75766	0	9	2013-01-10	2013-01-13	2	0	1	18633	6	1	1	6	68	275	9
87	2013-01-07 03:18:54	2	3	228	39	27063	1970.0900267207	1177767	0	2	2013-02-28	2013-03-04	2	1	1	8220	1	1	1	3	182	46	9
88	2013-01-07 03:19:34	2	3	66	174	31686	5439.988	89926	0	9	2013-01-09	2013-01-14	1	0	1	14907	1	1	1	0	87	58	78
89	2013-01-07 03:20:59	2	3	0	203	54226	1970.0900267207	540541	0	9	2013-01-15	2013-01-16	1	0	1	12236	6	1	1	2	50	676	33
90	2013-01-07 03:26:29	2	3	66	348	48862	1100.7225	961860	0	9	2013-02-14	2013-02-20	2	2	2	11373	1	1	1	4	128	1455	69
91	2013-01-07 03:32:10	2	3	198	208	54488	1970.0900267207	438844	0	1	2013-01-16	2013-01-17	1	1	1	22704	1	1	1	3	104	63	64
92	2013-01-07 03:33:23	2	3	66	348	48862	3.4778	869522	0	9	2013-01-14	2013-01-15	1	0	1	12234	6	1	1	2	50	675	41
93	2013-01-07 03:36:56	2	3	198	208	54488	1970.0900267207	438844	0	1	2013-01-16	2013-01-17	1	1	1	22704	1	1	1	3	104	63	64
94	2013-01-07 03:37:08	2	3	206	169	40251	1970.0900267207	348841	0	9	2013-01-07	2013-01-08	1	1	1	8224	1	1	1	6	135	278	29
95	2013-01-07 03:39:53	2	3	66	363	12346	1709.967	191258	0	9	2013-01-15	2013-01-16	1	0	1	1885	3	1	1	2	198	393	16
96	2013-01-07 04:00:06	2	3	0	203	54226	1970.0900267207	540541	0	9	2013-01-16	2013-01-17	1	1	1	12267	6	1	1	2	50	1230	49
97	2013-01-07 04:02:37	2	3	66	149	20498	913.8834	980042	0	0	2013-01-31	2013-02-02	1	0	1	11939	1	1	1	2	50	694	48
98	2013-01-07 04:07:27	2	3	48	424	5957	1970.0900267207	823750	0	3	2013-01-08	2013-01-11	1	1	1	5253	3	1	1	2	50	699	72
99	2013-01-07 04:08:58	2	3	190	133	27383	1970.0900267207	857247	0	9	2013-01-25	2013-01-29	2	0	1	8439	6	1	1	6	144	13	25
100	2013-01-07 04:09:16	2	3	0	317	56136	8077.8877	438592	0	9	2013-01-15	2013-01-16	1	0	1	5012	3	1	1	2	50	1105	50

Rows: 1-100 of 3000693 | Columns: 23

In this example, we filtered some data and want to go back to the previous structure.

In [33]:

expedia = expedia.load()
print(expedia.shape())

(37670293, 23)

Don't forget to use the help function when you need more information about the different functions!

In [34]:

help(expedia.load)

Help on method load in module verticapy.vdataframe:

load(offset:int=-1) method of verticapy.vdataframe.vDataFrame instance
    ---------------------------------------------------------------------------
    Loads a previous structure of the vDataFrame. 
    
    Parameters
    ----------
    offset: int, optional
        offset of the saving. Example: -1 to load the last saving.
    
    Returns
    -------
    vDataFrame
        vDataFrame of the loading.
    
    See Also
    --------
    vDataFrame.save : Saves the current vDataFrame structure.

The vDataFrame works the same way as a view. However, nothing is stored in the database unless you do it explicitly with the 'to_db' method.

You can save the vDataFrame's final relation using the 'to_db' method. If you want to save the result into a table, be sure to look at the expected disk usage of exporting the vDataFrame.

In [35]:

expedia.expected_store_usage(unit = "Gb")

Out[35]:

	expected_size (Gb)	max_size (Gb)	type
"date_time"	7.450580596923828e-09	0.2806655541062355	timestamp
"site_name"	0.047050174325704575	0.2806655541062355	int
"posa_continent"	0.03508319426327944	0.2806655541062355	int
"user_location_country"	0.07453343831002712	0.2806655541062355	int
"user_location_region"	0.09974934346973896	0.2806655541062355	int
"user_location_city"	0.16734247468411922	0.2806655541062355	int
"orig_destination_distance"	7.450580596923828e-09	0.2806655541062355	float
"user_id"	0.21339667681604624	0.2806655541062355	int
"is_mobile"	0.03508319426327944	0.2806655541062355	int
"channel"	0.03508897125720978	0.2806655541062355	int
"srch_ci"	7.450580596923828e-09	0.28031475841999054	date
"srch_co"	7.450580596923828e-09	0.28031475096940994	date
"srch_adults_cnt"	0.03508319426327944	0.2806655541062355	int
"srch_children_cnt"	0.03508319426327944	0.2806655541062355	int
"srch_rm_cnt"	0.03508319426327944	0.2806655541062355	int
"srch_destination_id"	0.15629629883915186	0.2806655541062355	int
"srch_destination_type_id"	0.03508319426327944	0.2806655541062355	int
"is_booking"	0.03508319426327944	0.2806655541062355	int
"cnt"	0.03518901206552982	0.2806655541062355	int
"hotel_continent"	0.03508319426327944	0.2806655541062355	int
"hotel_country"	0.0782695272937417	0.2806655541062355	int
"hotel_market"	0.10538724157959223	2.806655541062355	varchar
"hotel_cluster"	0.06651502475142479	0.2806655541062355	int
separator	0.8069134680554271	0.8069134680554271
header	3.4831464290618896e-07	3.4831464290618896e-07
rawsize	2.166397583670914	9.787509948946536

Rows: 1-26 | Columns: 4

After we decide that we have the space to store the vDataFrame, we can store it in our database.

In [36]:

expedia.to_db("public.expedia_clean",
              relation_type = "table")

Out[36]:

	📅 date_time Timestamp	123 site_name Int	123 posa_continent Int	123 user_location_country Int	123 user_location_region Int	123 user_location_city Int	123 orig_destination_distance Float	123 user_id Int	123 is_mobile Int	123 channel Int	📅 srch_ci Date	📅 srch_co Date	123 srch_adults_cnt Int	123 srch_children_cnt Int	123 srch_rm_cnt Int	123 srch_destination_id Int	123 srch_destination_type_id Int	123 is_booking Int	123 cnt Int	123 hotel_continent Int	123 hotel_country Int	Abc hotel_market Varchar	123 hotel_cluster Int
1	2013-01-07 00:00:02	24	2	3	50	5703	1970.0900267207	461899	0	9	2013-03-14	2013-03-15	2	1	1	669	3	0	1	2	50	212	41
2	2013-01-07 00:00:06	2	3	66	174	21177	5713.6206	13796	0	9	2013-01-19	2013-01-26	1	0	1	8821	1	0	3	6	17	30	58
3	2013-01-07 00:00:06	11	3	205	155	14703	795.7298	1128575	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	0	1	2	50	1230	91
4	2013-01-07 00:00:09	37	1	69	761	41949	1970.0900267207	1080476	0	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	10
5	2013-01-07 00:00:17	37	1	69	761	41949	1970.0900267207	1018895	0	0	2013-09-08	2013-09-10	2	0	1	27215	6	0	2	2	50	645	59
6	2013-01-07 00:00:17	37	1	69	761	41949	1970.0900267207	1080476	0	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	10
7	2013-01-07 00:00:18	2	3	66	462	18767	2696.5003	783725	1	9	2013-07-29	2013-08-04	3	3	1	8855	1	0	2	2	50	213	48
8	2013-01-07 00:00:23	37	1	69	761	41949	1970.0900267207	1080476	0	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	10
9	2013-01-07 00:00:28	2	3	66	294	7976	511.012	1197968	0	9	2013-01-16	2013-01-18	1	0	1	5811	3	0	2	2	50	350	51
10	2013-01-07 00:00:28	2	3	66	356	4779	4948.1861	593375	0	9	2013-05-09	2013-05-10	4	0	2	2763	3	0	2	6	70	1901	2
11	2013-01-07 00:00:29	24	2	3	49	19105	1970.0900267207	1174819	0	9	2013-04-25	2013-04-28	2	0	1	14908	1	0	1	3	88	1032	75
12	2013-01-07 00:00:33	24	2	3	50	53819	1970.0900267207	233534	0	1	2013-01-08	2013-01-09	1	1	1	26729	6	0	1	6	105	12	48
13	2013-01-07 00:00:39	2	3	119	0	27731	1970.0900267207	519086	0	9	2013-01-09	2013-01-11	1	0	1	1510	3	0	3	3	99	1043	82
14	2013-01-07 00:00:42	2	3	163	12	13476	1970.0900267207	176709	0	9	2013-01-19	2013-01-20	2	0	1	23507	6	0	2	6	70	19	59
15	2013-01-07 00:00:48	24	2	3	49	19105	1970.0900267207	1174819	0	9	2013-04-25	2013-04-28	2	0	1	14908	1	0	3	3	88	1032	96
16	2013-01-07 00:00:50	2	3	66	442	19744	5315.4141	1173504	0	9	2013-01-20	2013-01-26	1	1	1	8739	1	1	1	6	144	4	2
17	2013-01-07 00:00:51	24	2	3	50	53819	1970.0900267207	233534	0	1	2013-01-08	2013-01-09	1	1	1	26729	6	0	2	6	105	12	61
18	2013-01-07 00:00:55	2	3	66	459	43805	1768.161	836947	0	9	2013-04-13	2013-04-18	3	2	1	12257	6	0	2	4	51	1509	5
19	2013-01-07 00:00:59	2	3	23	48	4924	1970.0900267207	1046558	0	9	2013-02-15	2013-02-17	1	0	1	8244	1	0	4	2	50	643	68
20	2013-01-07 00:00:59	2	3	66	174	16634	382.6203	886436	0	1	2013-02-15	2013-02-20	2	0	1	8250	1	0	1	2	50	628	45
21	2013-01-07 00:01:10	24	2	3	50	5703	1970.0900267207	207769	0	1	2013-04-29	2013-05-01	2	0	1	18569	1	0	2	3	182	83	26
22	2013-01-07 00:01:18	2	3	215	646	51733	149.1411	901573	1	9	2013-01-17	2013-01-20	2	0	1	22119	6	0	2	4	8	128	58
23	2013-01-07 00:01:34	2	3	66	356	4779	4948.1861	593375	0	9	2013-05-09	2013-05-10	2	0	1	2763	3	0	3	6	70	1901	2
24	2013-01-07 00:01:36	2	3	66	174	6735	2404.0244	1103572	0	3	2013-02-11	2013-02-13	2	1	1	25169	6	0	2	2	50	674	4
25	2013-01-07 00:01:45	2	3	66	294	7976	509.4848	1197968	0	9	2013-01-16	2013-01-18	1	0	1	5811	3	0	4	2	50	350	76
26	2013-01-07 00:01:53	24	2	3	49	19105	1970.0900267207	1174819	0	9	2013-04-25	2013-04-28	2	0	1	14908	1	0	3	3	88	1032	5
27	2013-01-07 00:02:02	2	3	66	462	14703	2605.0971	614322	0	1	2013-01-08	2013-01-15	2	0	1	13094	3	0	1	2	50	212	41
28	2013-01-07 00:02:02	34	3	205	155	41977	155.835	725753	0	5	2013-01-11	2013-01-12	2	0	1	8288	1	1	1	2	198	399	10
29	2013-01-07 00:02:20	2	3	215	646	51733	149.2343	901573	1	9	2013-01-17	2013-01-20	2	0	1	22119	6	0	1	4	8	128	36
30	2013-01-07 00:02:21	34	3	205	135	13892	118.0871	799663	0	5	2013-01-08	2013-01-10	3	1	1	25946	6	1	1	2	198	370	73
31	2013-01-07 00:02:24	11	3	205	155	14703	796.3281	1128575	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	0	1	2	50	1230	4
32	2013-01-07 00:02:28	13	1	46	347	1810	5555.7996	970396	0	9	2013-10-09	2013-10-16	2	0	1	8279	1	0	2	2	50	1230	70
33	2013-01-07 00:02:33	11	3	205	155	12291	54.3112	756870	0	0	2013-01-08	2013-01-09	2	1	1	26385	6	0	1	2	198	1131	7
34	2013-01-07 00:02:36	2	3	66	348	47997	1692.3911	856762	0	2	2013-03-24	2013-04-01	6	0	3	11634	1	0	1	4	35	1616	40
35	2013-01-07 00:02:43	2	3	66	462	14703	2605.4093	614322	0	1	2013-01-08	2013-01-15	2	0	1	13094	3	0	1	2	50	212	40
36	2013-01-07 00:02:43	24	2	66	346	31371	384.0182	1162059	1	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	99
37	2013-01-07 00:02:45	2	3	66	356	22202	145.5446	1150496	0	3	2013-01-18	2013-01-20	2	0	1	12271	6	0	1	2	50	663	91
38	2013-01-07 00:02:46	24	2	3	50	5703	1970.0900267207	5061	0	3	2013-01-21	2013-01-22	1	1	1	8746	1	0	1	6	105	29	10
39	2013-01-07 00:02:59	24	2	66	346	31371	386.4055	1162059	1	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	91
40	2013-01-07 00:03:09	2	3	66	174	42881	379.3402	80737	0	9	2013-03-20	2013-03-23	2	0	1	47429	3	0	1	2	50	1230	18
41	2013-01-07 00:03:19	2	3	163	12	13476	1970.0900267207	176709	0	9	2013-01-19	2013-01-20	2	0	1	23507	6	0	1	6	70	19	49
42	2013-01-07 00:03:23	24	2	3	50	5703	1970.0900267207	486357	0	0	[null]	[null]	2	0	1	8043	3	0	1	2	50	676	83
43	2013-01-07 00:03:27	17	1	133	25	1503	1970.0900267207	228687	0	9	2013-02-18	2013-02-23	2	2	2	8821	1	0	1	6	17	30	82
44	2013-01-07 00:03:33	2	3	215	646	51733	149.2495	901573	1	9	2013-01-17	2013-01-20	2	0	1	22119	6	0	1	4	8	128	67
45	2013-01-07 00:03:35	2	3	70	47	14566	1970.0900267207	598076	0	9	2013-01-12	2013-01-13	6	1	3	8220	1	0	2	3	182	46	46
46	2013-01-07 00:03:37	24	2	66	346	31371	383.6014	1162059	1	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	94
47	2013-01-07 00:03:41	2	3	66	174	21177	5713.6394	13796	0	9	2013-01-19	2013-01-26	1	0	1	8821	1	0	1	6	17	30	67
48	2013-01-07 00:03:52	11	3	205	155	12291	54.216	756870	0	0	2013-01-08	2013-01-09	2	1	1	1811	1	0	1	2	198	1131	89
49	2013-01-07 00:03:59	11	3	205	155	14703	796.3082	1128575	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	0	2	2	50	1230	5
50	2013-01-07 00:04:02	2	3	66	356	4779	4948.1861	593375	0	9	2013-05-09	2013-05-10	2	0	1	2763	3	1	2	6	70	1901	2
51	2013-01-07 00:04:14	2	3	66	348	48862	1093.7218	368084	0	3	2013-03-04	2013-03-08	4	0	1	44045	3	0	2	2	50	701	98
52	2013-01-07 00:04:17	24	2	3	50	5703	1970.0900267207	486357	0	0	2013-01-24	2013-01-28	1	0	1	8043	3	0	4	2	50	676	83
53	2013-01-07 00:04:24	23	1	1	167	14716	1970.0900267207	1002737	0	9	2013-01-17	2013-01-22	2	0	1	468	1	0	1	3	48	153	64
54	2013-01-07 00:04:25	13	1	46	347	1810	5557.073	970396	0	9	2013-10-09	2013-10-16	2	0	1	8279	1	0	1	2	50	1230	68
55	2013-01-07 00:04:29	2	3	66	174	42881	379.3402	80737	0	9	2013-03-20	2013-03-23	4	0	2	47429	3	0	1	2	50	1230	18
56	2013-01-07 00:04:31	2	3	66	348	48862	1093.7218	368084	0	3	2013-03-01	2013-03-05	4	0	1	44045	3	0	1	2	50	701	98
57	2013-01-07 00:04:32	11	3	205	155	53078	989.6525	564664	0	9	2013-03-17	2013-03-21	4	2	1	8250	1	0	5	2	50	628	45
58	2013-01-07 00:04:33	2	3	3	50	31800	1970.0900267207	192004	0	9	2013-02-24	2013-02-28	1	1	1	22238	6	0	2	6	77	2	6
59	2013-01-07 00:04:33	2	3	103	45	38784	1970.0900267207	753394	1	9	2013-02-09	2013-02-17	2	3	1	8268	1	0	1	2	50	682	31
60	2013-01-07 00:04:37	2	3	66	467	16159	1510.8714	103449	0	9	2013-03-01	2013-03-05	2	0	1	8791	1	1	1	4	8	110	65
61	2013-01-07 00:04:40	24	2	66	346	31371	390.7326	1162059	1	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	40
62	2013-01-07 00:04:42	2	3	66	348	48862	1093.7218	368084	0	3	2013-03-02	2013-03-06	4	0	1	44045	3	0	1	2	50	701	98
63	2013-01-07 00:04:44	34	3	205	155	14703	995.1954	332853	1	9	2013-03-24	2013-03-28	2	2	1	12206	6	0	3	2	50	628	79
64	2013-01-07 00:04:51	2	3	66	462	18767	2688.2625	783725	1	9	2013-07-29	2013-08-04	3	3	1	8855	1	0	1	2	50	213	6
65	2013-01-07 00:04:55	37	1	69	761	41949	1970.0900267207	976118	0	3	2013-02-15	2013-02-15	2	1	1	1014	1	0	1	3	92	168	92
66	2013-01-07 00:04:58	2	3	66	348	48862	1093.7218	368084	0	3	2013-03-04	2013-03-06	4	0	1	44045	3	0	3	2	50	701	98
67	2013-01-07 00:05:04	2	3	3	50	31800	1970.0900267207	192004	0	9	2013-02-24	2013-02-28	2	1	1	22238	6	0	1	6	77	2	6
68	2013-01-07 00:05:06	37	1	69	761	41949	1970.0900267207	1080476	0	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	83
69	2013-01-07 00:05:13	37	1	69	761	41949	1970.0900267207	1080476	0	9	2013-05-29	2013-06-05	2	0	1	7635	3	0	1	2	50	675	55
70	2013-01-07 00:05:31	17	1	133	20	46168	1970.0900267207	124561	0	3	2013-03-09	2013-03-13	4	2	1	8253	1	0	2	6	70	19	48
71	2013-01-07 00:05:32	2	3	198	208	54488	1970.0900267207	425340	0	9	2013-02-10	2013-02-11	1	1	1	468	1	0	1	3	48	153	64
72	2013-01-07 00:05:34	24	2	3	50	22013	1970.0900267207	487265	0	0	2013-05-23	2013-05-27	4	1	1	669	3	0	1	2	50	212	96
73	2013-01-07 00:05:39	37	1	69	761	41949	1970.0900267207	1018895	0	0	2013-09-08	2013-09-10	2	0	1	27215	6	0	3	2	50	645	91
74	2013-01-07 00:05:42	2	3	66	348	48862	1093.7218	368084	0	3	2013-03-04	2013-03-07	4	0	1	44045	3	0	5	2	50	701	98
75	2013-01-07 00:05:47	24	2	66	174	26232	0.2863	212900	0	1	2013-01-06	2013-01-07	1	0	1	12269	6	0	1	2	50	1230	37
76	2013-01-07 00:05:56	11	3	205	155	12291	54.3112	756870	0	0	2013-01-08	2013-01-09	2	1	1	1811	1	0	5	2	198	1131	7
77	2013-01-07 00:05:57	24	2	3	48	8158	1970.0900267207	246177	0	0	2013-01-31	2013-02-05	2	0	1	1544	1	0	1	3	42	1229	38
78	2013-01-07 00:06:01	2	3	66	174	3622	2411.2046	58429	0	3	2013-07-03	2013-07-07	2	0	1	669	3	0	2	2	50	212	33
79	2013-01-07 00:06:07	2	3	66	174	21177	5712.9457	13796	0	9	2013-01-19	2013-01-26	1	0	1	8821	1	0	1	6	17	30	67
80	2013-01-07 00:06:11	2	3	66	174	27437	2439.5321	1115804	0	9	2013-02-02	2013-02-11	2	0	1	8855	1	0	1	2	50	213	0
81	2013-01-07 00:06:11	2	3	215	646	51733	149.4934	901573	1	9	2013-01-18	2013-01-20	2	0	1	22119	6	0	1	4	8	128	62
82	2013-01-07 00:06:23	2	3	66	142	17440	2299.5383	871663	0	9	2013-01-11	2013-01-12	1	0	1	8250	1	0	3	2	50	628	51
83	2013-01-07 00:06:33	24	2	3	50	22013	1970.0900267207	487265	0	0	2013-05-23	2013-05-27	4	1	1	669	3	0	1	2	50	212	83
84	2013-01-07 00:06:35	24	2	66	346	31371	384.2999	1162059	1	2	2013-01-11	2013-01-14	3	1	1	14985	1	0	1	2	50	1241	40
85	2013-01-07 00:06:40	11	3	205	385	50121	1970.0900267207	762423	0	9	2013-05-07	2013-05-12	2	1	1	8745	1	0	1	6	204	27	68
86	2013-01-07 00:06:42	2	3	23	48	4924	1970.0900267207	399892	1	1	2013-01-10	2013-01-13	2	0	1	12682	5	0	1	3	82	230	46
87	2013-01-07 00:06:51	2	3	66	174	3622	2411.0806	58429	0	3	2013-07-03	2013-07-07	2	0	1	669	3	0	1	2	50	212	83
88	2013-01-07 00:06:51	24	2	3	50	5703	1970.0900267207	18250	0	2	2013-03-31	2013-04-02	2	0	1	8282	1	0	1	3	126	232	57
89	2013-01-07 00:07:01	2	3	66	174	3622	2411.1283	58429	0	3	2013-07-03	2013-07-07	2	0	1	669	3	0	1	2	50	212	91
90	2013-01-07 00:07:02	2	3	66	189	17017	149.9512	204066	1	9	2013-02-01	2013-02-04	1	0	1	25800	6	0	2	2	50	696	95
91	2013-01-07 00:07:03	11	3	205	155	14703	796.3082	1128575	0	9	2013-01-19	2013-01-22	1	0	1	25064	6	1	1	2	50	1230	5
92	2013-01-07 00:07:08	37	1	69	648	6514	1970.0900267207	1084815	0	9	2013-03-18	2013-04-01	2	0	1	8268	1	0	5	2	50	682	23
93	2013-01-07 00:07:12	24	2	3	48	40722	1970.0900267207	1181711	0	3	2013-02-10	2013-02-13	2	0	1	8819	1	0	3	3	168	1242	97
94	2013-01-07 00:07:17	34	3	205	155	14703	994.4014	332853	1	9	2013-03-24	2013-03-28	2	2	1	12206	6	0	5	2	50	628	1
95	2013-01-07 00:07:23	2	3	66	174	21177	5712.5501	13796	0	9	2013-01-19	2013-01-26	1	0	1	8821	1	0	3	6	17	30	43
96	2013-01-07 00:07:25	2	3	66	174	3622	2411.2683	58429	0	3	2013-07-03	2013-07-07	2	0	1	669	3	0	1	2	50	212	5
97	2013-01-07 00:07:29	17	1	133	25	1503	1970.0900267207	228687	0	9	2013-02-18	2013-02-23	2	2	2	8821	1	0	2	6	17	30	22
98	2013-01-07 00:07:30	24	2	3	49	10457	1970.0900267207	1194880	0	0	2013-01-30	2013-02-02	1	0	1	20225	6	0	5	3	182	46	5
99	2013-01-07 00:07:30	24	2	3	50	5703	1970.0900267207	486357	0	0	2013-01-24	2013-01-28	2	1	1	8043	3	0	1	2	50	676	18
100	2013-01-07 00:07:36	24	2	3	50	22013	1970.0900267207	487265	0	0	2013-05-23	2013-05-27	4	1	1	669	3	0	3	2	50	212	69

Rows: 1-100 of 37670293 | Columns: 23