The Virtual DataFrame#
The Virtual DataFrame (vDataFrame) is the core object of the VerticaPy library. Leveraging the power of Vertica and the flexibility of Python, the vDataFrame is a Python object that lets you manipulate the data representation in a Vertica database without modifying the underlying data. The data represented by a vDataFrame remains in the Vertica database, bypassing the limitations of working memory. When a vDataFrame is created or altered, VerticaPy formulates the operation as an SQL query and pushes the computation to the Vertica database, harnessing Vertica’s massive parallel processing and in-built functions. Vertica then aggregates and returns the result to VerticaPy. In essence, vDataFrames behave similar to views in the Vertica database.
For more information about Vertica’s performance advantages, including its columnar orientation and parallelization across nodes, see the Vertica documentation.
In the following tutorial, we will introduce the basic functionality of the vDataFrame and then explore the ways in which they utilize in-database processing to enhance performance.
Creating vDataFrames#
First, run the load_titanic() function to ingest into Vertica a dataset with information about titanic passengers:
[2]:
from verticapy.datasets import load_titanic
# By default, the table is created in the `public` schema
load_titanic()
[2]:
123 pclassInteger | 123 survivedInteger | Abc Varchar(164) | Abc sexVarchar(20) | 123 ageNumeric(8) | 123 sibspInteger | 123 parchInteger | Abc ticketVarchar(36) | 123 fareNumeric(12) | Abc cabinVarchar(30) | Abc embarkedVarchar(20) | Abc boatVarchar(100) | 123 bodyInteger | Abc Varchar(100) | |
1 | 1 | 0 | male | 47.0 | 1 | 0 | PC 17757 | 227.525 | C62 C64 | C | [null] | 124 | ||
2 | 1 | 0 | male | 24.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C | [null] | [null] | ||
3 | 1 | 0 | male | 25.0 | 0 | 0 | 13905 | 26.0 | [null] | C | [null] | 148 | ||
4 | 1 | 0 | male | 42.0 | 0 | 0 | 110489 | 26.55 | D22 | S | [null] | [null] | ||
5 | 1 | 0 | male | 45.0 | 0 | 0 | 113050 | 26.55 | B38 | S | [null] | [null] | ||
6 | 1 | 0 | male | 46.0 | 1 | 0 | W.E.P. 5734 | 61.175 | E31 | S | [null] | [null] | ||
7 | 1 | 0 | male | 64.0 | 1 | 4 | 19950 | 263.0 | C23 C25 C27 | S | [null] | [null] | ||
8 | 1 | 0 | male | [null] | 0 | 0 | 113796 | 42.4 | [null] | S | [null] | [null] | ||
9 | 1 | 0 | male | 32.5 | 0 | 0 | 113503 | 211.5 | C132 | C | [null] | 45 | ||
10 | 1 | 0 | male | 55.0 | 0 | 0 | 113787 | 30.5 | C30 | S | [null] | [null] | ||
11 | 1 | 0 | male | 37.0 | 0 | 1 | PC 17596 | 29.7 | C118 | C | [null] | [null] | ||
12 | 1 | 0 | male | 64.0 | 0 | 0 | 693 | 26.0 | [null] | S | [null] | 263 | ||
13 | 1 | 0 | male | 56.0 | 0 | 0 | 113792 | 26.55 | [null] | S | [null] | [null] | ||
14 | 1 | 0 | male | 24.0 | 1 | 0 | 13695 | 60.0 | C31 | S | [null] | [null] | ||
15 | 1 | 1 | male | 0.92 | 1 | 2 | 113781 | 151.55 | C22 C26 | S | 11 | [null] | ||
16 | 1 | 1 | female | 18.0 | 1 | 0 | PC 17757 | 227.525 | C62 C64 | C | 4 | [null] | ||
17 | 1 | 1 | male | 80.0 | 0 | 0 | 27042 | 30.0 | A23 | S | B | [null] | ||
18 | 1 | 1 | female | 42.0 | 0 | 0 | PC 17757 | 227.525 | [null] | C | 4 | [null] | ||
19 | 1 | 1 | male | 25.0 | 1 | 0 | 11967 | 91.0792 | B49 | C | 7 | [null] | ||
20 | 1 | 1 | female | 45.0 | 0 | 0 | PC 17608 | 262.375 | [null] | C | 4 | [null] | ||
21 | 1 | 1 | female | 22.0 | 0 | 1 | 113505 | 55.0 | E33 | S | 6 | [null] | ||
22 | 1 | 1 | female | [null] | 0 | 0 | 17770 | 27.7208 | [null] | C | 5 | [null] | ||
23 | 1 | 1 | female | 22.0 | 0 | 0 | 113781 | 151.55 | [null] | S | 11 | [null] | ||
24 | 1 | 1 | female | 64.0 | 0 | 2 | PC 17756 | 83.1583 | E45 | C | 14 | [null] | ||
25 | 1 | 1 | female | 54.0 | 1 | 0 | 36947 | 78.2667 | D20 | C | 4 | [null] | ||
26 | 1 | 1 | male | 43.0 | 1 | 0 | 17765 | 27.7208 | D40 | C | 5 | [null] | ||
27 | 1 | 1 | female | 22.0 | 0 | 2 | 13568 | 49.5 | B39 | C | 5 | [null] | ||
28 | 1 | 1 | male | 23.0 | 0 | 1 | PC 17759 | 63.3583 | D10 D12 | C | 7 | [null] | ||
29 | 1 | 1 | female | 35.0 | 1 | 0 | 113789 | 52.0 | [null] | S | 8 | [null] | ||
30 | 1 | 1 | female | [null] | 1 | 0 | 17464 | 51.8625 | D21 | S | 8 | [null] | ||
31 | 1 | 1 | male | 42.0 | 1 | 0 | 11753 | 52.5542 | D19 | S | 5 | [null] | ||
32 | 1 | 1 | female | 45.0 | 1 | 0 | 11753 | 52.5542 | D19 | S | 5 | [null] | ||
33 | 1 | 1 | female | 16.0 | 0 | 1 | PC 17592 | 39.4 | D28 | S | 9 | [null] | ||
34 | 1 | 1 | female | 21.0 | 0 | 0 | 13502 | 77.9583 | D9 | S | 10 | [null] | ||
35 | 1 | 1 | male | 36.0 | 0 | 0 | PC 17473 | 26.2875 | E25 | S | 7 | [null] | ||
36 | 1 | 1 | male | [null] | 0 | 0 | F.C. 12998 | 25.7417 | [null] | C | 7 | [null] | ||
37 | 1 | 1 | female | 33.0 | 0 | 0 | PC 17613 | 27.7208 | A11 | C | 11 | [null] | ||
38 | 1 | 1 | female | 48.0 | 1 | 3 | PC 17608 | 262.375 | B57 B59 B63 B66 | C | 4 | [null] | ||
39 | 1 | 1 | female | 23.0 | 1 | 0 | 21228 | 82.2667 | B45 | S | 7 | [null] | ||
40 | 1 | 1 | female | 31.0 | 0 | 2 | 36928 | 164.8667 | C7 | S | 8 | [null] | ||
41 | 2 | 0 | male | 57.0 | 0 | 0 | 244346 | 13.0 | [null] | S | [null] | [null] | ||
42 | 2 | 0 | female | 29.0 | 1 | 0 | SC/AH 29037 | 26.0 | [null] | S | [null] | [null] | ||
43 | 2 | 0 | male | 29.0 | 0 | 0 | W./C. 14263 | 10.5 | [null] | S | [null] | [null] | ||
44 | 2 | 0 | female | 30.0 | 0 | 0 | 237249 | 13.0 | [null] | S | [null] | [null] | ||
45 | 2 | 0 | male | 24.0 | 0 | 0 | 248726 | 13.5 | [null] | S | [null] | 297 | ||
46 | 2 | 0 | male | 30.0 | 0 | 0 | 250646 | 13.0 | [null] | S | [null] | 305 | ||
47 | 2 | 0 | male | 52.0 | 0 | 0 | 250647 | 13.0 | [null] | S | [null] | 19 | ||
48 | 2 | 0 | male | 44.0 | 1 | 0 | 26707 | 26.0 | [null] | S | [null] | [null] | ||
49 | 2 | 0 | male | 57.0 | 0 | 0 | 219533 | 12.35 | [null] | Q | [null] | [null] | ||
50 | 2 | 0 | male | 32.0 | 0 | 0 | 237216 | 13.5 | [null] | S | [null] | 209 | ||
51 | 2 | 0 | male | 70.0 | 0 | 0 | C.A. 24580 | 10.5 | [null] | S | [null] | [null] | ||
52 | 2 | 0 | male | 54.0 | 0 | 0 | 29011 | 14.0 | [null] | S | [null] | [null] | ||
53 | 2 | 0 | male | 16.0 | 0 | 0 | S.O./P.P. 3 | 10.5 | [null] | S | [null] | [null] | ||
54 | 2 | 0 | male | 62.0 | 0 | 0 | 240276 | 9.6875 | [null] | Q | [null] | [null] | ||
55 | 2 | 0 | male | 27.0 | 0 | 0 | SC/PARIS 2168 | 15.0333 | [null] | C | [null] | [null] | ||
56 | 2 | 0 | male | 36.0 | 0 | 0 | C.A. 17248 | 10.5 | [null] | S | [null] | [null] | ||
57 | 2 | 0 | female | 27.0 | 1 | 0 | 11668 | 21.0 | [null] | S | [null] | [null] | ||
58 | 2 | 0 | male | 27.0 | 1 | 0 | 228414 | 26.0 | [null] | S | [null] | 293 | ||
59 | 2 | 0 | male | 66.0 | 0 | 0 | C.A. 24579 | 10.5 | [null] | S | [null] | [null] | ||
60 | 2 | 1 | female | 48.0 | 0 | 2 | C.A. 33112 | 36.75 | [null] | S | 14 | [null] | ||
61 | 2 | 1 | female | 30.0 | 1 | 0 | SC/PARIS 2148 | 13.8583 | [null] | C | 12 | [null] | ||
62 | 2 | 1 | female | 34.0 | 0 | 0 | 243880 | 13.0 | [null] | S | 12 | [null] | ||
63 | 2 | 1 | female | 24.0 | 0 | 2 | 250649 | 14.5 | [null] | S | 4 | [null] | ||
64 | 2 | 1 | female | 48.0 | 1 | 2 | 220845 | 65.0 | [null] | S | 9 | [null] | ||
65 | 2 | 1 | female | 3.0 | 1 | 2 | SC/Paris 2123 | 41.5792 | [null] | C | 14 | [null] | ||
66 | 2 | 1 | male | 3.0 | 1 | 1 | 230080 | 26.0 | F2 | S | D | [null] | ||
67 | 2 | 1 | female | 2.0 | 1 | 1 | 26360 | 26.0 | [null] | S | 11 | [null] | ||
68 | 2 | 1 | female | 18.0 | 0 | 2 | 250652 | 13.0 | [null] | S | 16 | [null] | ||
69 | 3 | 0 | male | 30.0 | 0 | 0 | C 7076 | 7.25 | [null] | S | [null] | 72 | ||
70 | 3 | 0 | female | 40.0 | 1 | 0 | 7546 | 9.475 | [null] | S | [null] | [null] | ||
71 | 3 | 0 | male | 35.0 | 0 | 0 | 373450 | 8.05 | [null] | S | [null] | [null] | ||
72 | 3 | 0 | female | 18.0 | 0 | 1 | 2691 | 14.4542 | [null] | C | [null] | [null] | ||
73 | 3 | 0 | male | [null] | 0 | 0 | 2664 | 7.225 | [null] | C | [null] | [null] | ||
74 | 3 | 0 | female | 32.0 | 1 | 1 | 364849 | 15.5 | [null] | Q | [null] | [null] | ||
75 | 3 | 0 | male | 22.0 | 0 | 0 | 350045 | 7.7958 | [null] | S | [null] | [null] | ||
76 | 3 | 0 | male | 35.0 | 0 | 0 | 364512 | 8.05 | [null] | S | [null] | [null] | ||
77 | 3 | 0 | male | [null] | 1 | 0 | 2689 | 14.4583 | [null] | C | [null] | [null] | ||
78 | 3 | 0 | female | [null] | 1 | 0 | 2689 | 14.4583 | [null] | C | [null] | [null] | ||
79 | 3 | 0 | male | 31.0 | 0 | 0 | 335097 | 7.75 | [null] | Q | [null] | [null] | ||
80 | 3 | 0 | female | 22.0 | 0 | 0 | 7552 | 10.5167 | [null] | S | [null] | [null] | ||
81 | 3 | 0 | male | 0.33 | 0 | 2 | 347080 | 14.4 | [null] | S | [null] | [null] | ||
82 | 3 | 0 | male | 34.0 | 1 | 1 | 347080 | 14.4 | [null] | S | [null] | 197 | ||
83 | 3 | 0 | female | 28.0 | 1 | 1 | 347080 | 14.4 | [null] | S | [null] | [null] | ||
84 | 3 | 0 | male | 25.0 | 0 | 0 | 349203 | 7.8958 | [null] | S | [null] | [null] | ||
85 | 3 | 0 | male | 25.0 | 0 | 0 | 349250 | 7.8958 | [null] | S | [null] | [null] | ||
86 | 3 | 0 | male | [null] | 0 | 0 | 349238 | 7.8958 | [null] | S | [null] | [null] | ||
87 | 3 | 0 | male | [null] | 0 | 0 | 349225 | 7.8958 | [null] | S | [null] | [null] | ||
88 | 3 | 0 | male | 22.0 | 0 | 0 | A/5 21172 | 7.25 | [null] | S | [null] | [null] | ||
89 | 3 | 0 | male | [null] | 0 | 0 | 2674 | 7.225 | [null] | C | [null] | [null] | ||
90 | 3 | 0 | male | [null] | 0 | 0 | 2631 | 7.225 | [null] | C | [null] | [null] | ||
91 | 3 | 0 | male | 40.5 | 0 | 0 | C.A. 6212 | 15.1 | [null] | S | [null] | 187 | ||
92 | 3 | 0 | male | 40.5 | 0 | 0 | 367232 | 7.75 | [null] | Q | [null] | 68 | ||
93 | 3 | 0 | male | 18.0 | 0 | 0 | 350036 | 7.7958 | [null] | S | [null] | [null] | ||
94 | 3 | 0 | female | [null] | 0 | 0 | 364859 | 7.75 | [null] | Q | [null] | [null] | ||
95 | 3 | 0 | male | [null] | 0 | 0 | 349254 | 7.8958 | [null] | C | [null] | [null] | ||
96 | 3 | 0 | male | 40.0 | 1 | 6 | CA 2144 | 46.9 | [null] | S | [null] | [null] | ||
97 | 3 | 0 | male | 51.0 | 0 | 0 | 21440 | 8.05 | [null] | S | [null] | [null] | ||
98 | 3 | 0 | male | 26.0 | 1 | 0 | 350025 | 7.8542 | [null] | S | [null] | [null] | ||
99 | 3 | 0 | female | [null] | 0 | 0 | 382649 | 7.75 | [null] | Q | [null] | [null] | ||
100 | 3 | 0 | male | [null] | 0 | 0 | 349220 | 7.8958 | [null] | S | [null] | [null] |
You can create a vDataFrame from either an existing relation or a customized relation.
To create a vDataFrame using an existing relation, in this case the Titanic dataset, provide the name of the dataset:
[3]:
import verticapy as vp
vp.vDataFrame("public.titanic")
[3]:
123 pclassInteger | 123 survivedInteger | Abc Varchar(164) | Abc sexVarchar(20) | 123 ageNumeric(8) | 123 sibspInteger | 123 parchInteger | Abc ticketVarchar(36) | 123 fareNumeric(12) | Abc cabinVarchar(30) | Abc embarkedVarchar(20) | Abc boatVarchar(100) | 123 bodyInteger | Abc Varchar(100) | |
1 | 1 | 0 | male | 47.0 | 1 | 0 | PC 17757 | 227.525 | C62 C64 | C | [null] | 124 | ||
2 | 1 | 0 | male | 24.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C | [null] | [null] | ||
3 | 1 | 0 | male | 25.0 | 0 | 0 | 13905 | 26.0 | [null] | C | [null] | 148 | ||
4 | 1 | 0 | male | 42.0 | 0 | 0 | 110489 | 26.55 | D22 | S | [null] | [null] | ||
5 | 1 | 0 | male | 45.0 | 0 | 0 | 113050 | 26.55 | B38 | S | [null] | [null] | ||
6 | 1 | 0 | male | 46.0 | 1 | 0 | W.E.P. 5734 | 61.175 | E31 | S | [null] | [null] | ||
7 | 1 | 0 | male | 64.0 | 1 | 4 | 19950 | 263.0 | C23 C25 C27 | S | [null] | [null] | ||
8 | 1 | 0 | male | [null] | 0 | 0 | 113796 | 42.4 | [null] | S | [null] | [null] | ||
9 | 1 | 0 | male | 32.5 | 0 | 0 | 113503 | 211.5 | C132 | C | [null] | 45 | ||
10 | 1 | 0 | male | 55.0 | 0 | 0 | 113787 | 30.5 | C30 | S | [null] | [null] | ||
11 | 1 | 0 | male | 37.0 | 0 | 1 | PC 17596 | 29.7 | C118 | C | [null] | [null] | ||
12 | 1 | 0 | male | 64.0 | 0 | 0 | 693 | 26.0 | [null] | S | [null] | 263 | ||
13 | 1 | 0 | male | 56.0 | 0 | 0 | 113792 | 26.55 | [null] | S | [null] | [null] | ||
14 | 1 | 0 | male | 24.0 | 1 | 0 | 13695 | 60.0 | C31 | S | [null] | [null] | ||
15 | 1 | 1 | male | 0.92 | 1 | 2 | 113781 | 151.55 | C22 C26 | S | 11 | [null] | ||
16 | 1 | 1 | female | 18.0 | 1 | 0 | PC 17757 | 227.525 | C62 C64 | C | 4 | [null] | ||
17 | 1 | 1 | male | 80.0 | 0 | 0 | 27042 | 30.0 | A23 | S | B | [null] | ||
18 | 1 | 1 | female | 42.0 | 0 | 0 | PC 17757 | 227.525 | [null] | C | 4 | [null] | ||
19 | 1 | 1 | male | 25.0 | 1 | 0 | 11967 | 91.0792 | B49 | C | 7 | [null] | ||
20 | 1 | 1 | female | 45.0 | 0 | 0 | PC 17608 | 262.375 | [null] | C | 4 | [null] | ||
21 | 1 | 1 | female | 22.0 | 0 | 1 | 113505 | 55.0 | E33 | S | 6 | [null] | ||
22 | 1 | 1 | female | [null] | 0 | 0 | 17770 | 27.7208 | [null] | C | 5 | [null] | ||
23 | 1 | 1 | female | 22.0 | 0 | 0 | 113781 | 151.55 | [null] | S | 11 | [null] | ||
24 | 1 | 1 | female | 64.0 | 0 | 2 | PC 17756 | 83.1583 | E45 | C | 14 | [null] | ||
25 | 1 | 1 | female | 54.0 | 1 | 0 | 36947 | 78.2667 | D20 | C | 4 | [null] | ||
26 | 1 | 1 | male | 43.0 | 1 | 0 | 17765 | 27.7208 | D40 | C | 5 | [null] | ||
27 | 1 | 1 | female | 22.0 | 0 | 2 | 13568 | 49.5 | B39 | C | 5 | [null] | ||
28 | 1 | 1 | male | 23.0 | 0 | 1 | PC 17759 | 63.3583 | D10 D12 | C | 7 | [null] | ||
29 | 1 | 1 | female | 35.0 | 1 | 0 | 113789 | 52.0 | [null] | S | 8 | [null] | ||
30 | 1 | 1 | female | [null] | 1 | 0 | 17464 | 51.8625 | D21 | S | 8 | [null] | ||
31 | 1 | 1 | male | 42.0 | 1 | 0 | 11753 | 52.5542 | D19 | S | 5 | [null] | ||
32 | 1 | 1 | female | 45.0 | 1 | 0 | 11753 | 52.5542 | D19 | S | 5 | [null] | ||
33 | 1 | 1 | female | 16.0 | 0 | 1 | PC 17592 | 39.4 | D28 | S | 9 | [null] | ||
34 | 1 | 1 | female | 21.0 | 0 | 0 | 13502 | 77.9583 | D9 | S | 10 | [null] | ||
35 | 1 | 1 | male | 36.0 | 0 | 0 | PC 17473 | 26.2875 | E25 | S | 7 | [null] | ||
36 | 1 | 1 | male | [null] | 0 | 0 | F.C. 12998 | 25.7417 | [null] | C | 7 | [null] | ||
37 | 1 | 1 | female | 33.0 | 0 | 0 | PC 17613 | 27.7208 | A11 | C | 11 | [null] | ||
38 | 1 | 1 | female | 48.0 | 1 | 3 | PC 17608 | 262.375 | B57 B59 B63 B66 | C | 4 | [null] | ||
39 | 1 | 1 | female | 23.0 | 1 | 0 | 21228 | 82.2667 | B45 | S | 7 | [null] | ||
40 | 1 | 1 | female | 31.0 | 0 | 2 | 36928 | 164.8667 | C7 | S | 8 | [null] | ||
41 | 2 | 0 | male | 57.0 | 0 | 0 | 244346 | 13.0 | [null] | S | [null] | [null] | ||
42 | 2 | 0 | female | 29.0 | 1 | 0 | SC/AH 29037 | 26.0 | [null] | S | [null] | [null] | ||
43 | 2 | 0 | male | 29.0 | 0 | 0 | W./C. 14263 | 10.5 | [null] | S | [null] | [null] | ||
44 | 2 | 0 | female | 30.0 | 0 | 0 | 237249 | 13.0 | [null] | S | [null] | [null] | ||
45 | 2 | 0 | male | 24.0 | 0 | 0 | 248726 | 13.5 | [null] | S | [null] | 297 | ||
46 | 2 | 0 | male | 30.0 | 0 | 0 | 250646 | 13.0 | [null] | S | [null] | 305 | ||
47 | 2 | 0 | male | 52.0 | 0 | 0 | 250647 | 13.0 | [null] | S | [null] | 19 | ||
48 | 2 | 0 | male | 44.0 | 1 | 0 | 26707 | 26.0 | [null] | S | [null] | [null] | ||
49 | 2 | 0 | male | 57.0 | 0 | 0 | 219533 | 12.35 | [null] | Q | [null] | [null] | ||
50 | 2 | 0 | male | 32.0 | 0 | 0 | 237216 | 13.5 | [null] | S | [null] | 209 | ||
51 | 2 | 0 | male | 70.0 | 0 | 0 | C.A. 24580 | 10.5 | [null] | S | [null] | [null] | ||
52 | 2 | 0 | male | 54.0 | 0 | 0 | 29011 | 14.0 | [null] | S | [null] | [null] | ||
53 | 2 | 0 | male | 16.0 | 0 | 0 | S.O./P.P. 3 | 10.5 | [null] | S | [null] | [null] | ||
54 | 2 | 0 | male | 62.0 | 0 | 0 | 240276 | 9.6875 | [null] | Q | [null] | [null] | ||
55 | 2 | 0 | male | 27.0 | 0 | 0 | SC/PARIS 2168 | 15.0333 | [null] | C | [null] | [null] | ||
56 | 2 | 0 | male | 36.0 | 0 | 0 | C.A. 17248 | 10.5 | [null] | S | [null] | [null] | ||
57 | 2 | 0 | female | 27.0 | 1 | 0 | 11668 | 21.0 | [null] | S | [null] | [null] | ||
58 | 2 | 0 | male | 27.0 | 1 | 0 | 228414 | 26.0 | [null] | S | [null] | 293 | ||
59 | 2 | 0 | male | 66.0 | 0 | 0 | C.A. 24579 | 10.5 | [null] | S | [null] | [null] | ||
60 | 2 | 1 | female | 48.0 | 0 | 2 | C.A. 33112 | 36.75 | [null] | S | 14 | [null] | ||
61 | 2 | 1 | female | 30.0 | 1 | 0 | SC/PARIS 2148 | 13.8583 | [null] | C | 12 | [null] | ||
62 | 2 | 1 | female | 34.0 | 0 | 0 | 243880 | 13.0 | [null] | S | 12 | [null] | ||
63 | 2 | 1 | female | 24.0 | 0 | 2 | 250649 | 14.5 | [null] | S | 4 | [null] | ||
64 | 2 | 1 | female | 48.0 | 1 | 2 | 220845 | 65.0 | [null] | S | 9 | [null] | ||
65 | 2 | 1 | female | 3.0 | 1 | 2 | SC/Paris 2123 | 41.5792 | [null] | C | 14 | [null] | ||
66 | 2 | 1 | male | 3.0 | 1 | 1 | 230080 | 26.0 | F2 | S | D | [null] | ||
67 | 2 | 1 | female | 2.0 | 1 | 1 | 26360 | 26.0 | [null] | S | 11 | [null] | ||
68 | 2 | 1 | female | 18.0 | 0 | 2 | 250652 | 13.0 | [null] | S | 16 | [null] | ||
69 | 3 | 0 | male | 30.0 | 0 | 0 | C 7076 | 7.25 | [null] | S | [null] | 72 | ||
70 | 3 | 0 | female | 40.0 | 1 | 0 | 7546 | 9.475 | [null] | S | [null] | [null] | ||
71 | 3 | 0 | male | 35.0 | 0 | 0 | 373450 | 8.05 | [null] | S | [null] | [null] | ||
72 | 3 | 0 | female | 18.0 | 0 | 1 | 2691 | 14.4542 | [null] | C | [null] | [null] | ||
73 | 3 | 0 | male | [null] | 0 | 0 | 2664 | 7.225 | [null] | C | [null] | [null] | ||
74 | 3 | 0 | female | 32.0 | 1 | 1 | 364849 | 15.5 | [null] | Q | [null] | [null] | ||
75 | 3 | 0 | male | 22.0 | 0 | 0 | 350045 | 7.7958 | [null] | S | [null] | [null] | ||
76 | 3 | 0 | male | 35.0 | 0 | 0 | 364512 | 8.05 | [null] | S | [null] | [null] | ||
77 | 3 | 0 | male | [null] | 1 | 0 | 2689 | 14.4583 | [null] | C | [null] | [null] | ||
78 | 3 | 0 | female | [null] | 1 | 0 | 2689 | 14.4583 | [null] | C | [null] | [null] | ||
79 | 3 | 0 | male | 31.0 | 0 | 0 | 335097 | 7.75 | [null] | Q | [null] | [null] | ||
80 | 3 | 0 | female | 22.0 | 0 | 0 | 7552 | 10.5167 | [null] | S | [null] | [null] | ||
81 | 3 | 0 | male | 0.33 | 0 | 2 | 347080 | 14.4 | [null] | S | [null] | [null] | ||
82 | 3 | 0 | male | 34.0 | 1 | 1 | 347080 | 14.4 | [null] | S | [null] | 197 | ||
83 | 3 | 0 | female | 28.0 | 1 | 1 | 347080 | 14.4 | [null] | S | [null] | [null] | ||
84 | 3 | 0 | male | 25.0 | 0 | 0 | 349203 | 7.8958 | [null] | S | [null] | [null] | ||
85 | 3 | 0 | male | 25.0 | 0 | 0 | 349250 | 7.8958 | [null] | S | [null] | [null] | ||
86 | 3 | 0 | male | [null] | 0 | 0 | 349238 | 7.8958 | [null] | S | [null] | [null] | ||
87 | 3 | 0 | male | [null] | 0 | 0 | 349225 | 7.8958 | [null] | S | [null] | [null] | ||
88 | 3 | 0 | male | 22.0 | 0 | 0 | A/5 21172 | 7.25 | [null] | S | [null] | [null] | ||
89 | 3 | 0 | male | [null] | 0 | 0 | 2674 | 7.225 | [null] | C | [null] | [null] | ||
90 | 3 | 0 | male | [null] | 0 | 0 | 2631 | 7.225 | [null] | C | [null] | [null] | ||
91 | 3 | 0 | male | 40.5 | 0 | 0 | C.A. 6212 | 15.1 | [null] | S | [null] | 187 | ||
92 | 3 | 0 | male | 40.5 | 0 | 0 | 367232 | 7.75 | [null] | Q | [null] | 68 | ||
93 | 3 | 0 | male | 18.0 | 0 | 0 | 350036 | 7.7958 | [null] | S | [null] | [null] | ||
94 | 3 | 0 | female | [null] | 0 | 0 | 364859 | 7.75 | [null] | Q | [null] | [null] | ||
95 | 3 | 0 | male | [null] | 0 | 0 | 349254 | 7.8958 | [null] | C | [null] | [null] | ||
96 | 3 | 0 | male | 40.0 | 1 | 6 | CA 2144 | 46.9 | [null] | S | [null] | [null] | ||
97 | 3 | 0 | male | 51.0 | 0 | 0 | 21440 | 8.05 | [null] | S | [null] | [null] | ||
98 | 3 | 0 | male | 26.0 | 1 | 0 | 350025 | 7.8542 | [null] | S | [null] | [null] | ||
99 | 3 | 0 | female | [null] | 0 | 0 | 382649 | 7.75 | [null] | Q | [null] | [null] | ||
100 | 3 | 0 | male | [null] | 0 | 0 | 349220 | 7.8958 | [null] | S | [null] | [null] |
To create a vDataFrame using a customized relation, specify the SQL query for that relation as the argument:
[4]:
vp.vDataFrame("SELECT pclass, AVG(survived) AS survived FROM titanic GROUP BY 1")
[4]:
123 pclassInteger | 123 survivedFloat(22) | |
1 | 2 | 0.416988416988417 |
2 | 1 | 0.612179487179487 |
3 | 3 | 0.227752639517345 |
For more examples of creating vDataFrames, see the vDataFrame reference page.
Loading and processing in-memory vs. in-database#
The following examples demonstrate the performance advantages of loading and processing data in-database versus in-memory.
First, we download the Expedia dataset from Kaggle and then load it into Vertica:
NOTE: You might need to change the value of the
path
agrument for theread_csv
function, depending on where your data file is located.
[ ]:
vp.read_csv("train.csv", schema = "public", parse_nrows = 20000000)
Once the data is loaded into the Vertica database, we can create a vDataFrame using the relation that contains the Expedia dataset:
[6]:
import time
start_time = time.time()
expedia = vp.vDataFrame("public.train")
print("elapsed time = {}".format(time.time() - start_time))
display(expedia)
elapsed time = 0.15794777870178223
📅 date_timeTimestamp(29) | 123 site_nameInteger | 123 posa_continentInteger | 123 user_location_countryInteger | 123 user_location_regionInteger | 123 user_location_cityInteger | 123 orig_destination_distanceNumeric(14) | 123 user_idInteger | 123 is_mobileInteger | 123 is_packageInteger | 123 channelInteger | 📅 srch_ciDate | 📅 srch_coDate | 123 srch_adults_cntInteger | 123 srch_children_cntInteger | 123 srch_rm_cntInteger | 123 srch_destination_idInteger | 123 srch_destination_type_idInteger | 123 is_bookingInteger | 123 cntInteger | 123 hotel_continentInteger | 123 hotel_countryInteger | 123 hotel_marketInteger | 123 hotel_clusterInteger | |
1 | 2013-01-07 00:00:28 | 2 | 3 | 66 | 356 | 4779 | 4948.1861 | 593375 | 0 | 0 | 9 | 2013-05-09 | 2013-05-10 | 4 | 0 | 2 | 2763 | 3 | 0 | 2 | 6 | 70 | 1901 | 2 |
2 | 2013-01-07 00:01:34 | 2 | 3 | 66 | 356 | 4779 | 4948.1861 | 593375 | 0 | 0 | 9 | 2013-05-09 | 2013-05-10 | 2 | 0 | 1 | 2763 | 3 | 0 | 3 | 6 | 70 | 1901 | 2 |
3 | 2013-01-07 00:01:45 | 2 | 3 | 66 | 294 | 7976 | 509.4848 | 1197968 | 0 | 0 | 9 | 2013-01-16 | 2013-01-18 | 1 | 0 | 1 | 5811 | 3 | 0 | 4 | 2 | 50 | 350 | 76 |
4 | 2013-01-07 00:01:53 | 24 | 2 | 3 | 49 | 19105 | [null] | 1174819 | 0 | 0 | 9 | 2013-04-25 | 2013-04-28 | 2 | 0 | 1 | 14908 | 1 | 0 | 3 | 3 | 88 | 1032 | 5 |
5 | 2013-01-07 00:02:24 | 11 | 3 | 205 | 155 | 14703 | 796.3281 | 1128575 | 0 | 0 | 9 | 2013-01-19 | 2013-01-22 | 1 | 0 | 1 | 25064 | 6 | 0 | 1 | 2 | 50 | 1230 | 4 |
6 | 2013-01-07 00:02:43 | 2 | 3 | 66 | 462 | 14703 | 2605.4093 | 614322 | 0 | 0 | 1 | 2013-01-08 | 2013-01-15 | 2 | 0 | 1 | 13094 | 3 | 0 | 1 | 2 | 50 | 212 | 40 |
7 | 2013-01-07 00:02:43 | 24 | 2 | 66 | 346 | 31371 | 384.0182 | 1162059 | 1 | 0 | 2 | 2013-01-11 | 2013-01-14 | 3 | 1 | 1 | 14985 | 1 | 0 | 1 | 2 | 50 | 1241 | 99 |
8 | 2013-01-07 00:03:35 | 2 | 3 | 70 | 47 | 14566 | [null] | 598076 | 0 | 0 | 9 | 2013-01-12 | 2013-01-13 | 6 | 1 | 3 | 8220 | 1 | 0 | 2 | 3 | 182 | 46 | 46 |
9 | 2013-01-07 00:03:41 | 2 | 3 | 66 | 174 | 21177 | 5713.6394 | 13796 | 0 | 0 | 9 | 2013-01-19 | 2013-01-26 | 1 | 0 | 1 | 8821 | 1 | 0 | 1 | 6 | 17 | 30 | 67 |
10 | 2013-01-07 00:04:33 | 2 | 3 | 3 | 50 | 31800 | [null] | 192004 | 0 | 0 | 9 | 2013-02-24 | 2013-02-28 | 1 | 1 | 1 | 22238 | 6 | 0 | 2 | 6 | 77 | 2 | 6 |
11 | 2013-01-07 00:05:04 | 2 | 3 | 3 | 50 | 31800 | [null] | 192004 | 0 | 0 | 9 | 2013-02-24 | 2013-02-28 | 2 | 1 | 1 | 22238 | 6 | 0 | 1 | 6 | 77 | 2 | 6 |
12 | 2013-01-07 00:05:06 | 37 | 1 | 69 | 761 | 41949 | [null] | 1080476 | 0 | 1 | 9 | 2013-05-29 | 2013-06-05 | 2 | 0 | 1 | 7635 | 3 | 0 | 1 | 2 | 50 | 675 | 83 |
13 | 2013-01-07 00:05:39 | 37 | 1 | 69 | 761 | 41949 | [null] | 1018895 | 0 | 0 | 0 | 2013-09-08 | 2013-09-10 | 2 | 0 | 1 | 27215 | 6 | 0 | 3 | 2 | 50 | 645 | 91 |
14 | 2013-01-07 00:06:35 | 24 | 2 | 66 | 346 | 31371 | 384.2999 | 1162059 | 1 | 0 | 2 | 2013-01-11 | 2013-01-14 | 3 | 1 | 1 | 14985 | 1 | 0 | 1 | 2 | 50 | 1241 | 40 |
15 | 2013-01-07 00:07:17 | 34 | 3 | 205 | 155 | 14703 | 994.4014 | 332853 | 1 | 0 | 9 | 2013-03-24 | 2013-03-28 | 2 | 2 | 1 | 12206 | 6 | 0 | 5 | 2 | 50 | 628 | 1 |
16 | 2013-01-07 00:09:19 | 11 | 3 | 205 | 155 | 53078 | 990.6027 | 564664 | 0 | 1 | 9 | 2013-03-17 | 2013-03-18 | 2 | 2 | 1 | 8250 | 1 | 0 | 4 | 2 | 50 | 628 | 49 |
17 | 2013-01-07 00:09:30 | 2 | 3 | 66 | 294 | 7976 | 509.8404 | 1197968 | 0 | 0 | 9 | 2013-01-16 | 2013-01-18 | 1 | 0 | 1 | 5811 | 3 | 0 | 3 | 2 | 50 | 350 | 57 |
18 | 2013-01-07 00:11:00 | 2 | 3 | 66 | 174 | 21316 | 229.342 | 631355 | 0 | 0 | 0 | 2013-01-18 | 2013-01-21 | 2 | 1 | 1 | 3361 | 3 | 0 | 1 | 2 | 50 | 351 | 89 |
19 | 2013-01-07 00:11:27 | 2 | 3 | 66 | 174 | 14987 | 5992.4521 | 893627 | 0 | 1 | 9 | 2013-05-22 | 2013-05-30 | 2 | 0 | 1 | 8788 | 1 | 0 | 1 | 6 | 77 | 2 | 11 |
20 | 2013-01-07 00:11:57 | 2 | 3 | 66 | 174 | 38899 | 341.9762 | 53327 | 1 | 0 | 9 | 2013-01-07 | 2013-01-10 | 2 | 2 | 1 | 24374 | 6 | 0 | 3 | 2 | 50 | 627 | 32 |
21 | 2013-01-07 00:14:19 | 24 | 2 | 3 | 50 | 53819 | [null] | 233534 | 0 | 0 | 1 | 2013-01-08 | 2013-01-09 | 1 | 1 | 1 | 26729 | 6 | 0 | 2 | 6 | 105 | 12 | 75 |
22 | 2013-01-07 00:14:20 | 13 | 1 | 63 | 480 | 38276 | 11447.1445 | 1012948 | 0 | 0 | 9 | 2013-02-08 | 2013-02-09 | 1 | 1 | 1 | 8217 | 1 | 0 | 1 | 0 | 34 | 1396 | 2 |
23 | 2013-01-07 00:14:47 | 2 | 3 | 66 | 246 | 50661 | 2756.3593 | 283970 | 0 | 0 | 0 | 2013-05-23 | 2013-05-30 | 2 | 0 | 1 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 54 |
24 | 2013-01-07 00:17:05 | 2 | 3 | 93 | 71 | 15362 | [null] | 568988 | 0 | 0 | 9 | 2013-01-26 | 2013-01-30 | 2 | 1 | 1 | 3011 | 3 | 0 | 2 | 6 | 107 | 822 | 58 |
25 | 2013-01-07 00:17:25 | 34 | 3 | 205 | 155 | 14703 | 995.7923 | 332853 | 1 | 0 | 9 | 2013-03-24 | 2013-03-28 | 2 | 2 | 1 | 12206 | 6 | 0 | 1 | 2 | 50 | 628 | 1 |
26 | 2013-01-07 00:18:38 | 2 | 3 | 66 | 174 | 42881 | 379.1427 | 80737 | 0 | 0 | 9 | 2013-03-21 | 2013-03-24 | 2 | 0 | 1 | 8279 | 1 | 0 | 3 | 2 | 50 | 1230 | 94 |
27 | 2013-01-07 00:18:52 | 34 | 3 | 205 | 155 | 14703 | 994.6585 | 332853 | 1 | 0 | 9 | 2013-03-24 | 2013-03-28 | 2 | 2 | 1 | 12206 | 6 | 0 | 1 | 2 | 50 | 628 | 79 |
28 | 2013-01-07 00:20:54 | 2 | 3 | 149 | 290 | 33116 | [null] | 473260 | 0 | 0 | 9 | 2013-01-24 | 2013-01-26 | 2 | 0 | 1 | 40903 | 6 | 0 | 1 | 3 | 48 | 152 | 29 |
29 | 2013-01-07 00:23:13 | 17 | 1 | 133 | 20 | 46168 | [null] | 124561 | 0 | 0 | 3 | 2013-03-09 | 2013-03-13 | 2 | 2 | 1 | 20796 | 6 | 0 | 1 | 6 | 70 | 1904 | 25 |
30 | 2013-01-07 00:23:42 | 2 | 3 | 198 | 208 | 54488 | [null] | 68386 | 0 | 0 | 9 | 2013-03-06 | 2013-03-07 | 2 | 1 | 1 | 20118 | 6 | 0 | 1 | 3 | 5 | 44 | 85 |
31 | 2013-01-07 00:23:57 | 37 | 1 | 69 | 648 | 6514 | [null] | 1084815 | 0 | 1 | 9 | 2013-03-18 | 2013-04-01 | 2 | 0 | 1 | 8268 | 1 | 0 | 2 | 2 | 50 | 682 | 51 |
32 | 2013-01-07 00:24:11 | 37 | 1 | 69 | 574 | 32474 | [null] | 1084816 | 0 | 0 | 9 | 2013-02-24 | 2013-02-26 | 3 | 0 | 1 | 24606 | 6 | 0 | 3 | 3 | 151 | 80 | 20 |
33 | 2013-01-07 00:27:36 | 37 | 1 | 69 | 761 | 41949 | [null] | 976118 | 0 | 1 | 3 | 2013-02-15 | 2013-02-19 | 2 | 1 | 1 | 1014 | 1 | 0 | 11 | 3 | 92 | 168 | 92 |
34 | 2013-01-07 00:28:01 | 2 | 3 | 198 | 208 | 54488 | [null] | 68386 | 0 | 0 | 9 | 2013-03-06 | 2013-03-07 | 2 | 1 | 1 | 20115 | 6 | 0 | 6 | 3 | 5 | 44 | 89 |
35 | 2013-01-07 00:30:34 | 2 | 3 | 66 | 184 | 11740 | 6875.6962 | 1042547 | 0 | 0 | 7 | 2013-02-02 | 2013-02-06 | 1 | 1 | 1 | 788 | 3 | 0 | 3 | 3 | 0 | 1500 | 8 |
36 | 2013-01-07 00:32:16 | 8 | 4 | 77 | 462 | 3492 | 10465.1087 | 1170845 | 0 | 0 | 1 | 2013-05-16 | 2013-05-20 | 1 | 0 | 1 | 8218 | 1 | 0 | 1 | 2 | 50 | 743 | 50 |
37 | 2013-01-07 00:32:44 | 24 | 2 | 3 | 50 | 5703 | [null] | 233286 | 1 | 0 | 1 | 2013-05-25 | 2013-05-27 | 1 | 0 | 1 | 8745 | 1 | 0 | 1 | 6 | 204 | 27 | 92 |
38 | 2013-01-07 00:34:17 | 2 | 3 | 69 | 879 | 25671 | [null] | 866032 | 0 | 0 | 9 | 2013-02-19 | 2013-02-23 | 2 | 0 | 1 | 18774 | 1 | 0 | 1 | 3 | 99 | 1239 | 67 |
39 | 2013-01-07 00:35:19 | 2 | 3 | 66 | 189 | 17017 | 2194.9441 | 204066 | 1 | 1 | 9 | 2013-01-31 | 2013-02-11 | 1 | 0 | 1 | 8272 | 1 | 0 | 1 | 2 | 50 | 659 | 15 |
40 | 2013-01-07 00:35:29 | 37 | 1 | 69 | 574 | 32474 | [null] | 1084816 | 0 | 0 | 9 | 2013-02-24 | 2013-02-26 | 3 | 0 | 1 | 24606 | 6 | 0 | 1 | 3 | 151 | 80 | 9 |
41 | 2013-01-07 00:37:34 | 2 | 3 | 66 | 174 | 21177 | 5711.6309 | 13796 | 0 | 0 | 9 | 2013-01-19 | 2013-01-26 | 1 | 1 | 1 | 8821 | 1 | 0 | 1 | 6 | 17 | 30 | 62 |
42 | 2013-01-07 00:37:47 | 37 | 1 | 69 | 639 | 56293 | [null] | 1084371 | 0 | 0 | 0 | 2013-02-28 | 2013-03-01 | 2 | 0 | 1 | 47900 | 4 | 0 | 2 | 6 | 70 | 309 | 15 |
43 | 2013-01-07 00:37:49 | 2 | 3 | 66 | 348 | 48862 | 1093.4715 | 368084 | 0 | 1 | 3 | 2013-03-04 | 2013-03-08 | 4 | 0 | 1 | 44045 | 3 | 0 | 2 | 2 | 50 | 701 | 70 |
44 | 2013-01-07 00:38:42 | 24 | 2 | 3 | 71 | 36693 | [null] | 1168297 | 0 | 0 | 0 | 2013-01-27 | 2013-01-28 | 1 | 0 | 1 | 8806 | 1 | 0 | 1 | 6 | 105 | 22 | 64 |
45 | 2013-01-07 00:40:01 | 2 | 3 | 66 | 189 | 17017 | 2191.5759 | 204066 | 1 | 1 | 9 | 2013-01-31 | 2013-02-11 | 1 | 0 | 1 | 8272 | 1 | 0 | 1 | 2 | 50 | 659 | 50 |
46 | 2013-01-07 00:41:02 | 24 | 2 | 3 | 51 | 9527 | [null] | 259991 | 0 | 0 | 1 | 2013-05-05 | 2013-05-06 | 1 | 0 | 1 | 23507 | 6 | 1 | 1 | 6 | 70 | 19 | 9 |
47 | 2013-01-07 00:41:26 | 2 | 3 | 66 | 448 | 45508 | 762.6629 | 1159130 | 0 | 1 | 9 | 2013-04-17 | 2013-04-25 | 2 | 0 | 1 | 8268 | 1 | 0 | 5 | 2 | 50 | 682 | 95 |
48 | 2013-01-07 00:42:23 | 2 | 3 | 66 | 348 | 48862 | 1093.561 | 368084 | 0 | 1 | 3 | 2013-03-04 | 2013-03-08 | 4 | 0 | 1 | 44045 | 3 | 0 | 11 | 2 | 50 | 701 | 41 |
49 | 2013-01-07 00:43:26 | 2 | 3 | 149 | 290 | 33116 | [null] | 692110 | 0 | 1 | 9 | 2013-02-22 | 2013-02-28 | 2 | 0 | 1 | 1725 | 1 | 0 | 2 | 3 | 182 | 1493 | 53 |
50 | 2013-01-07 00:44:48 | 24 | 2 | 3 | 48 | 8158 | [null] | 1156005 | 0 | 0 | 9 | 2013-02-09 | 2013-02-14 | 2 | 0 | 1 | 20225 | 6 | 0 | 1 | 3 | 182 | 46 | 5 |
51 | 2013-01-07 00:45:25 | 2 | 3 | 66 | 174 | 26259 | 352.4308 | 1114339 | 0 | 0 | 9 | 2013-02-15 | 2013-02-17 | 2 | 0 | 1 | 12175 | 6 | 0 | 1 | 2 | 50 | 366 | 13 |
52 | 2013-01-07 00:46:18 | 2 | 3 | 66 | 258 | 42500 | 1643.056 | 616447 | 0 | 0 | 9 | 2013-02-02 | 2013-02-03 | 3 | 1 | 1 | 29211 | 4 | 0 | 1 | 2 | 50 | 1630 | 3 |
53 | 2013-01-07 00:49:27 | 11 | 3 | 205 | 385 | 50121 | [null] | 762423 | 0 | 0 | 9 | 2013-05-07 | 2013-05-12 | 2 | 1 | 1 | 12241 | 6 | 0 | 6 | 6 | 204 | 27 | 58 |
54 | 2013-01-07 00:50:03 | 2 | 3 | 66 | 174 | 42586 | 45.2179 | 844983 | 0 | 0 | 9 | 2013-02-16 | 2013-02-17 | 2 | 0 | 1 | 8835 | 1 | 0 | 2 | 2 | 50 | 357 | 34 |
55 | 2013-01-07 00:51:48 | 2 | 3 | 1 | 145 | 15028 | [null] | 186070 | 1 | 0 | 9 | 2013-01-07 | 2013-01-08 | 2 | 0 | 1 | 12411 | 5 | 0 | 5 | 6 | 105 | 29 | 8 |
56 | 2013-01-07 00:53:55 | 17 | 1 | 63 | 480 | 38276 | 139.1247 | 58305 | 1 | 1 | 0 | 2013-01-11 | 2013-02-11 | 1 | 0 | 1 | 22076 | 6 | 0 | 3 | 6 | 46 | 1463 | 83 |
57 | 2013-01-07 00:54:09 | 2 | 3 | 66 | 448 | 29872 | 1283.2022 | 1118612 | 0 | 1 | 9 | 2013-01-13 | 2013-01-17 | 2 | 0 | 1 | 8791 | 1 | 0 | 1 | 4 | 8 | 110 | 52 |
58 | 2013-01-07 00:54:27 | 24 | 2 | 3 | 48 | 25460 | [null] | 19385 | 0 | 0 | 9 | 2013-08-25 | 2013-08-26 | 2 | 2 | 1 | 783 | 1 | 0 | 3 | 3 | 5 | 98 | 36 |
59 | 2013-01-07 00:55:04 | 2 | 3 | 66 | 448 | 45508 | 768.593 | 1159130 | 0 | 1 | 9 | 2013-04-17 | 2013-04-25 | 2 | 0 | 1 | 8268 | 1 | 0 | 1 | 2 | 50 | 682 | 91 |
60 | 2013-01-07 00:55:05 | 2 | 3 | 66 | 189 | 17017 | 2188.5618 | 204066 | 1 | 1 | 9 | 2013-01-31 | 2013-02-11 | 1 | 0 | 1 | 8272 | 1 | 0 | 1 | 2 | 50 | 659 | 42 |
61 | 2013-01-07 00:56:07 | 24 | 2 | 3 | 48 | 25460 | [null] | 19385 | 0 | 0 | 9 | 2013-08-25 | 2013-08-26 | 2 | 2 | 1 | 783 | 1 | 0 | 4 | 3 | 5 | 98 | 30 |
62 | 2013-01-07 00:58:13 | 2 | 3 | 66 | 174 | 19457 | 1646.9747 | 608231 | 0 | 0 | 2 | 2013-02-21 | 2013-02-25 | 2 | 0 | 1 | 8243 | 1 | 0 | 1 | 2 | 50 | 409 | 18 |
63 | 2013-01-07 00:59:04 | 34 | 3 | 205 | 135 | 27655 | 3493.532 | 314929 | 0 | 0 | 9 | 2013-02-10 | 2013-02-16 | 1 | 0 | 1 | 12839 | 5 | 0 | 8 | 4 | 51 | 1509 | 34 |
64 | 2013-01-07 00:59:15 | 2 | 3 | 149 | 290 | 33116 | [null] | 692110 | 0 | 1 | 9 | 2013-02-22 | 2013-02-27 | 2 | 0 | 1 | 21222 | 1 | 0 | 6 | 3 | 152 | 73 | 27 |
65 | 2013-01-07 00:59:44 | 2 | 3 | 93 | 71 | 15362 | [null] | 568988 | 0 | 0 | 9 | 2013-01-30 | 2013-02-02 | 2 | 1 | 1 | 21317 | 6 | 0 | 1 | 6 | 107 | 822 | 3 |
66 | 2013-01-07 00:59:56 | 24 | 2 | 3 | 50 | 5703 | [null] | 486892 | 0 | 0 | 3 | 2013-02-08 | 2013-02-10 | 2 | 1 | 1 | 8746 | 1 | 1 | 1 | 6 | 105 | 29 | 43 |
67 | 2013-01-07 01:01:08 | 2 | 3 | 231 | 68 | 42296 | [null] | 75386 | 0 | 1 | 8 | 2013-02-11 | 2013-02-19 | 2 | 0 | 1 | 14908 | 1 | 0 | 2 | 3 | 88 | 1032 | 5 |
68 | 2013-01-07 01:02:08 | 2 | 3 | 115 | 957 | 17195 | [null] | 648231 | 0 | 0 | 9 | 2013-01-24 | 2013-01-25 | 2 | 1 | 1 | 8220 | 1 | 0 | 1 | 3 | 182 | 46 | 46 |
69 | 2013-01-07 01:02:25 | 24 | 2 | 69 | 844 | 22970 | [null] | 1153820 | 0 | 0 | 0 | 2013-01-15 | 2013-01-18 | 2 | 0 | 1 | 8253 | 1 | 0 | 1 | 6 | 70 | 19 | 50 |
70 | 2013-01-07 01:04:35 | 2 | 3 | 66 | 174 | 26232 | 2568.6493 | 61897 | 0 | 1 | 0 | 2013-01-11 | 2013-01-11 | 1 | 0 | 1 | 8267 | 1 | 0 | 1 | 2 | 50 | 675 | 10 |
71 | 2013-01-07 01:04:56 | 37 | 1 | 69 | 639 | 56293 | [null] | 1084371 | 0 | 0 | 0 | 2013-02-28 | 2013-03-02 | 2 | 0 | 1 | 47900 | 4 | 0 | 2 | 6 | 70 | 309 | 15 |
72 | 2013-01-07 01:07:37 | 24 | 2 | 3 | 83 | 22548 | [null] | 1154667 | 0 | 0 | 9 | 2013-08-04 | 2013-08-06 | 1 | 0 | 1 | 8245 | 1 | 0 | 1 | 6 | 22 | 245 | 8 |
73 | 2013-01-07 01:10:22 | 24 | 2 | 3 | 48 | 8158 | [null] | 1156005 | 0 | 0 | 9 | 2013-02-09 | 2013-02-14 | 2 | 0 | 1 | 20225 | 6 | 0 | 4 | 3 | 182 | 46 | 62 |
74 | 2013-01-07 01:11:45 | 2 | 3 | 66 | 448 | 53078 | 828.1394 | 68291 | 0 | 1 | 9 | 2013-03-07 | 2013-03-10 | 2 | 0 | 1 | 8260 | 1 | 0 | 6 | 2 | 50 | 701 | 70 |
75 | 2013-01-07 01:14:19 | 2 | 3 | 35 | 177 | 33543 | 2650.4725 | 119414 | 0 | 0 | 9 | 2013-01-17 | 2013-01-19 | 1 | 0 | 1 | 18788 | 1 | 1 | 1 | 5 | 203 | 253 | 61 |
76 | 2013-01-07 01:18:14 | 8 | 4 | 133 | 30 | 28810 | [null] | 1014729 | 0 | 1 | 1 | 2014-01-03 | 2014-01-04 | 2 | 1 | 1 | 8745 | 1 | 0 | 1 | 6 | 204 | 27 | 64 |
77 | 2013-01-07 01:18:15 | 2 | 3 | 66 | 174 | 18037 | 7445.0478 | 385172 | 0 | 0 | 9 | 2013-01-22 | 2013-01-24 | 1 | 0 | 1 | 13747 | 4 | 0 | 1 | 0 | 63 | 1258 | 95 |
78 | 2013-01-07 01:20:43 | 2 | 3 | 229 | 407 | 16373 | [null] | 217610 | 0 | 0 | 9 | 2013-01-08 | 2013-01-10 | 1 | 0 | 1 | 12014 | 1 | 0 | 2 | 2 | 50 | 644 | 15 |
79 | 2013-01-07 01:26:37 | 2 | 3 | 229 | 407 | 16373 | [null] | 217610 | 0 | 0 | 9 | 2013-01-08 | 2013-01-11 | 1 | 0 | 1 | 12014 | 1 | 1 | 1 | 2 | 50 | 644 | 15 |
80 | 2013-01-07 01:27:19 | 2 | 3 | 66 | 174 | 14752 | 342.9899 | 1043452 | 0 | 0 | 9 | 2013-03-01 | 2013-03-02 | 2 | 0 | 1 | 8279 | 1 | 1 | 1 | 2 | 50 | 1230 | 83 |
81 | 2013-01-07 01:31:05 | 2 | 3 | 66 | 174 | 36242 | 2331.5257 | 395056 | 1 | 0 | 9 | 2013-05-16 | 2013-05-18 | 2 | 0 | 1 | 12226 | 6 | 0 | 2 | 2 | 50 | 701 | 31 |
82 | 2013-01-07 01:35:49 | 24 | 2 | 69 | 844 | 22970 | [null] | 1153820 | 0 | 0 | 0 | 2013-01-15 | 2013-01-17 | 1 | 0 | 1 | 8253 | 1 | 0 | 1 | 6 | 70 | 19 | 73 |
83 | 2013-01-07 01:35:54 | 2 | 3 | 198 | 208 | 54488 | [null] | 848741 | 1 | 0 | 0 | 2013-01-07 | 2013-01-11 | 3 | 0 | 1 | 45687 | 6 | 0 | 2 | 3 | 48 | 153 | 46 |
84 | 2013-01-07 01:37:58 | 2 | 3 | 23 | 78 | 23278 | [null] | 800769 | 0 | 0 | 9 | 2013-01-11 | 2013-01-13 | 2 | 0 | 1 | 8282 | 1 | 0 | 2 | 3 | 126 | 232 | 44 |
85 | 2013-01-07 01:38:03 | 2 | 3 | 66 | 348 | 48862 | 1093.6026 | 368084 | 0 | 1 | 3 | 2013-03-04 | 2013-03-08 | 4 | 0 | 1 | 44045 | 3 | 0 | 2 | 2 | 50 | 701 | 84 |
86 | 2013-01-07 01:38:26 | 2 | 3 | 70 | 47 | 14566 | [null] | 756106 | 0 | 0 | 9 | 2013-01-11 | 2013-01-17 | 1 | 0 | 1 | 20225 | 6 | 0 | 3 | 3 | 182 | 46 | 3 |
87 | 2013-01-07 01:47:42 | 37 | 1 | 66 | 174 | 5938 | 5.0992 | 458129 | 0 | 0 | 0 | 2013-01-07 | 2013-01-11 | 1 | 1 | 1 | 24964 | 6 | 0 | 2 | 2 | 50 | 368 | 47 |
88 | 2013-01-07 01:48:16 | 2 | 3 | 1 | 395 | 29254 | 916.4085 | 901162 | 0 | 0 | 9 | 2013-01-16 | 2013-01-18 | 2 | 0 | 1 | 19520 | 1 | 0 | 1 | 6 | 77 | 154 | 46 |
89 | 2013-01-07 01:51:26 | 37 | 1 | 69 | 761 | 41949 | [null] | 1084830 | 1 | 0 | 9 | 2013-02-14 | 2013-02-15 | 2 | 0 | 1 | 42219 | 1 | 0 | 1 | 3 | 82 | 329 | 67 |
90 | 2013-01-07 01:51:41 | 2 | 3 | 70 | 47 | 14566 | [null] | 132231 | 0 | 0 | 9 | 2013-02-25 | 2013-03-01 | 3 | 0 | 3 | 40822 | 4 | 0 | 1 | 3 | 99 | 88 | 61 |
91 | 2013-01-07 01:53:35 | 24 | 2 | 3 | 64 | 22203 | [null] | 255311 | 0 | 0 | 1 | 2013-02-24 | 2013-02-28 | 1 | 0 | 1 | 8747 | 1 | 0 | 1 | 3 | 106 | 107 | 3 |
92 | 2013-01-07 01:58:26 | 24 | 2 | 3 | 50 | 5703 | [null] | 4769 | 0 | 0 | 4 | 2013-01-21 | 2013-01-22 | 1 | 0 | 1 | 8741 | 1 | 0 | 2 | 6 | 144 | 13 | 47 |
93 | 2013-01-07 01:58:43 | 2 | 3 | 66 | 246 | 50661 | 2757.8656 | 283970 | 0 | 0 | 0 | 2013-05-26 | 2013-05-28 | 1 | 0 | 1 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 88 |
94 | 2013-01-07 02:01:59 | 2 | 3 | 23 | 78 | 23278 | [null] | 800769 | 0 | 0 | 9 | 2013-01-11 | 2013-01-13 | 2 | 0 | 1 | 8282 | 1 | 0 | 1 | 3 | 126 | 232 | 64 |
95 | 2013-01-07 02:04:19 | 2 | 3 | 66 | 174 | 14752 | 5442.2807 | 914984 | 0 | 1 | 3 | 2013-02-09 | 2013-03-10 | 2 | 0 | 1 | 12213 | 6 | 0 | 1 | 6 | 70 | 19 | 46 |
96 | 2013-01-07 02:04:54 | 2 | 3 | 229 | 407 | 16373 | [null] | 342897 | 0 | 0 | 9 | 2013-01-14 | 2013-01-16 | 1 | 0 | 1 | 8291 | 1 | 0 | 2 | 2 | 50 | 191 | 59 |
97 | 2013-01-07 02:07:35 | 2 | 3 | 93 | 71 | 15362 | [null] | 777027 | 0 | 0 | 9 | 2013-02-16 | 2013-02-20 | 1 | 0 | 1 | 468 | 1 | 0 | 6 | 3 | 48 | 153 | 42 |
98 | 2013-01-07 02:09:00 | 2 | 3 | 66 | 246 | 28491 | 2377.8348 | 355007 | 0 | 0 | 4 | 2013-01-17 | 2013-01-18 | 3 | 0 | 1 | 24189 | 6 | 0 | 1 | 2 | 50 | 360 | 48 |
99 | 2013-01-07 02:09:57 | 2 | 3 | 235 | 13 | 53632 | [null] | 627910 | 0 | 0 | 9 | 2013-01-17 | 2013-01-18 | 1 | 0 | 1 | 15655 | 5 | 1 | 1 | 3 | 1 | 142 | 38 |
100 | 2013-01-07 02:13:17 | 34 | 3 | 133 | 14 | 54307 | [null] | 974316 | 0 | 0 | 5 | 2013-02-01 | 2013-02-03 | 2 | 0 | 1 | 3059 | 6 | 0 | 1 | 6 | 204 | 1786 | 22 |
The vDataFrame was created in about a second. All the data—about 4GB—is stored in Vertica, requiring no in-memory data loading.
Now, to compare the above result with in-memory loading, we load about half the dataset into pandas:
NOTE: This process is expensive on local machines, so avoid running the following code if your computer has less than 2GB of memory.
[7]:
import pandas as pd
L_nrows = [10000, 100000, 1000000, 2000000, 5000000, 10000000, 20000000]
L_time = []
for nrows in L_nrows:
start_time = time.time()
expedia_df = pd.read_csv("train.csv", nrows = nrows)
elapsed_time = time.time() - start_time
L_time.append(elapsed_time)
print("nrows = {}; elapsed time = {}".format(nrows, elapsed_time))
nrows = 10000; elapsed time = 0.03764772415161133
nrows = 100000; elapsed time = 0.19388937950134277
nrows = 1000000; elapsed time = 1.8165876865386963
nrows = 2000000; elapsed time = 3.613091230392456
nrows = 5000000; elapsed time = 8.884109735488892
nrows = 10000000; elapsed time = 19.819123029708862
nrows = 20000000; elapsed time = 62.796350717544556
It took about a minute to load half the dataset into memory. Compared with the one second required to create the vDataFrame on the entire dataset in-database, in-memory loading is magnitudes more expensive. Loading data into pandas is quite fast when the data volume is low (less than some MB), but as the size of the dataset increases, the load time can become exponentially more expensive, as seen in the following plot:
[8]:
import matplotlib.pyplot as plt
plt.plot(L_nrows, L_time)
plt.show()
Even after the data is loaded into memory, the performance is very slow. The following example removes non-numeric columns from the dataset, then computes a correlation matrix:
[9]:
columns_to_drop = ['date_time', 'srch_ci', 'srch_co']
expedia_df = expedia_df.drop(columns_to_drop, axis=1)
[10]:
start_time = time.time()
expedia_df.corr()
print(f"elapsed time = {time.time() - start_time}")
elapsed time = 20.517507791519165
The operation took about 20 seconds. Let’s compare the performance in-database using a vDataFrame to compute the correlation matrix of the entire dataset:
[11]:
# Remove non-numeric columns
expedia.drop(columns = ['date_time', 'srch_ci', 'srch_co'])
[11]:
123 site_nameInteger | 123 posa_continentInteger | 123 user_location_countryInteger | 123 user_location_regionInteger | 123 user_location_cityInteger | 123 orig_destination_distanceNumeric(14) | 123 user_idInteger | 123 is_mobileInteger | 123 is_packageInteger | 123 channelInteger | 123 srch_adults_cntInteger | 123 srch_children_cntInteger | 123 srch_rm_cntInteger | 123 srch_destination_idInteger | 123 srch_destination_type_idInteger | 123 is_bookingInteger | 123 cntInteger | 123 hotel_continentInteger | 123 hotel_countryInteger | 123 hotel_marketInteger | 123 hotel_clusterInteger | |
1 | 2 | 3 | 66 | 448 | 50947 | 1892.1183 | 1082027 | 0 | 0 | 9 | 3 | 0 | 1 | 12811 | 5 | 0 | 1 | 4 | 57 | 172 | 82 |
2 | 24 | 2 | 3 | 50 | 5703 | [null] | 229916 | 0 | 0 | 4 | 1 | 0 | 1 | 8744 | 1 | 0 | 1 | 3 | 106 | 108 | 85 |
3 | 2 | 3 | 66 | 442 | 18617 | [null] | 1080633 | 0 | 1 | 9 | 1 | 1 | 1 | 8223 | 1 | 0 | 1 | 2 | 50 | 690 | 98 |
4 | 2 | 3 | 66 | 448 | 24848 | 2270.5665 | 445852 | 0 | 0 | 9 | 1 | 0 | 1 | 8278 | 1 | 0 | 1 | 2 | 50 | 368 | 68 |
5 | 40 | 3 | 66 | 314 | 4868 | 3831.1887 | 775452 | 0 | 1 | 5 | 0 | 0 | 1 | 8745 | 1 | 1 | 1 | 6 | 204 | 27 | 5 |
6 | 2 | 3 | 66 | 448 | 50947 | 790.8617 | 656916 | 0 | 1 | 1 | 2 | 1 | 1 | 14840 | 1 | 1 | 1 | 2 | 50 | 1629 | 90 |
7 | 2 | 3 | 66 | 435 | 14812 | [null] | 1120833 | 0 | 1 | 9 | 0 | 0 | 1 | 18046 | 1 | 0 | 1 | 6 | 105 | 1808 | 43 |
8 | 37 | 1 | 198 | 153 | 17956 | [null] | 695183 | 0 | 0 | 9 | 1 | 0 | 1 | 8264 | 1 | 0 | 1 | 5 | 108 | 1568 | 38 |
9 | 2 | 3 | 115 | 957 | 17195 | [null] | 932702 | 0 | 0 | 9 | 1 | 0 | 1 | 268 | 1 | 0 | 2 | 3 | 99 | 1033 | 60 |
10 | 2 | 3 | 66 | 348 | 48862 | 206.7024 | 829823 | 0 | 0 | 1 | 1 | 0 | 1 | 17918 | 4 | 0 | 2 | 2 | 50 | 191 | 69 |
11 | 34 | 3 | 205 | 354 | 25315 | 11.6801 | 522574 | 0 | 0 | 9 | 2 | 1 | 1 | 12951 | 5 | 0 | 2 | 2 | 198 | 397 | 16 |
12 | 2 | 3 | 69 | 751 | 32273 | [null] | 44517 | 0 | 1 | 1 | 1 | 0 | 1 | 27912 | 6 | 0 | 1 | 6 | 70 | 233 | 5 |
13 | 2 | 3 | 66 | 335 | 34369 | [null] | 407113 | 0 | 1 | 1 | 2 | 0 | 1 | 8791 | 1 | 0 | 2 | 4 | 8 | 110 | 34 |
14 | 2 | 3 | 66 | 348 | 47997 | 2235.2638 | 370921 | 0 | 1 | 9 | 2 | 0 | 1 | 12206 | 6 | 0 | 4 | 2 | 50 | 628 | 45 |
15 | 34 | 3 | 205 | 354 | 33567 | 2360.1619 | 506148 | 1 | 1 | 9 | 2 | 0 | 1 | 11439 | 1 | 0 | 1 | 4 | 163 | 1503 | 65 |
16 | 2 | 3 | 154 | 196 | 38140 | [null] | 997119 | 0 | 0 | 9 | 1 | 0 | 1 | 22865 | 6 | 0 | 1 | 4 | 93 | 2086 | 5 |
17 | 2 | 3 | 133 | 20 | 3392 | [null] | 1115968 | 0 | 0 | 9 | 2 | 0 | 1 | 21662 | 1 | 0 | 1 | 6 | 105 | 1835 | 67 |
18 | 2 | 3 | 66 | 435 | 14812 | [null] | 1120833 | 0 | 1 | 9 | 0 | 0 | 1 | 18046 | 1 | 0 | 1 | 6 | 105 | 1808 | 22 |
19 | 11 | 3 | 205 | 385 | 46963 | 101.4726 | 270476 | 0 | 0 | 9 | 1 | 1 | 1 | 8270 | 1 | 0 | 2 | 2 | 198 | 390 | 42 |
20 | 2 | 3 | 198 | 153 | 17956 | [null] | 1037184 | 0 | 0 | 1 | 2 | 0 | 1 | 8808 | 1 | 0 | 1 | 6 | 169 | 28 | 64 |
21 | 34 | 3 | 205 | 354 | 21728 | 218.9421 | 38602 | 0 | 0 | 9 | 2 | 0 | 1 | 26022 | 6 | 0 | 2 | 2 | 198 | 397 | 68 |
22 | 2 | 3 | 66 | 337 | 21707 | 1203.3684 | 57440 | 0 | 0 | 3 | 2 | 0 | 1 | 28514 | 6 | 0 | 2 | 2 | 50 | 1457 | 55 |
23 | 2 | 3 | 66 | 220 | 3781 | 496.483 | 876585 | 0 | 0 | 9 | 2 | 0 | 1 | 26105 | 4 | 1 | 1 | 2 | 50 | 678 | 25 |
24 | 11 | 3 | 205 | 354 | 20788 | [null] | 901429 | 0 | 0 | 4 | 2 | 0 | 1 | 12951 | 5 | 0 | 1 | 2 | 198 | 397 | 19 |
25 | 2 | 3 | 133 | 20 | 3392 | [null] | 1115968 | 0 | 0 | 9 | 2 | 0 | 1 | 21662 | 1 | 0 | 1 | 6 | 105 | 1835 | 44 |
26 | 2 | 3 | 66 | 311 | 42961 | 162.5475 | 887966 | 0 | 0 | 9 | 1 | 0 | 1 | 45326 | 6 | 0 | 1 | 2 | 50 | 675 | 21 |
27 | 2 | 3 | 66 | 442 | 32693 | [null] | 504767 | 0 | 0 | 9 | 1 | 0 | 1 | 8278 | 1 | 0 | 1 | 2 | 50 | 368 | 21 |
28 | 2 | 3 | 69 | 750 | 1393 | [null] | 573693 | 0 | 0 | 9 | 2 | 0 | 1 | 8213 | 1 | 0 | 1 | 6 | 68 | 275 | 29 |
29 | 2 | 3 | 66 | 174 | 15081 | 6190.7912 | 352516 | 0 | 0 | 9 | 2 | 1 | 1 | 531 | 3 | 0 | 1 | 6 | 105 | 12 | 22 |
30 | 2 | 3 | 66 | 348 | 12318 | [null] | 189555 | 0 | 0 | 9 | 2 | 3 | 1 | 11361 | 1 | 0 | 1 | 2 | 50 | 591 | 18 |
31 | 2 | 3 | 66 | 314 | 45853 | 999.6031 | 678175 | 0 | 1 | 9 | 1 | 0 | 1 | 8266 | 1 | 0 | 1 | 2 | 50 | 411 | 95 |
32 | 2 | 3 | 12 | 790 | 53139 | [null] | 769993 | 0 | 0 | 9 | 2 | 0 | 1 | 8220 | 1 | 0 | 1 | 3 | 182 | 46 | 58 |
33 | 2 | 3 | 66 | 220 | 15928 | 403.5242 | 119685 | 0 | 1 | 9 | 4 | 0 | 2 | 11923 | 1 | 0 | 2 | 4 | 80 | 204 | 84 |
34 | 2 | 3 | 66 | 348 | 48862 | 1098.0557 | 115672 | 0 | 0 | 9 | 1 | 0 | 1 | 11853 | 1 | 0 | 3 | 2 | 50 | 702 | 69 |
35 | 2 | 3 | 66 | 318 | 17329 | 6555.1395 | 700832 | 0 | 0 | 9 | 1 | 0 | 1 | 20324 | 6 | 0 | 1 | 3 | 126 | 232 | 58 |
36 | 2 | 3 | 66 | 318 | 23208 | 7127.5052 | 179440 | 0 | 0 | 1 | 1 | 0 | 1 | 19950 | 6 | 0 | 1 | 3 | 99 | 88 | 58 |
37 | 2 | 3 | 66 | 220 | 2086 | 5853.7925 | 510447 | 0 | 0 | 1 | 1 | 0 | 1 | 1198 | 3 | 0 | 2 | 6 | 127 | 1281 | 67 |
38 | 2 | 3 | 66 | 220 | 4948 | 550.285 | 1014337 | 0 | 0 | 0 | 2 | 0 | 1 | 11319 | 1 | 0 | 1 | 2 | 50 | 681 | 32 |
39 | 2 | 3 | 66 | 448 | 31799 | 1262.8339 | 85412 | 0 | 1 | 9 | 3 | 1 | 1 | 11938 | 1 | 0 | 3 | 4 | 8 | 126 | 65 |
40 | 2 | 3 | 66 | 348 | 48862 | 2234.3926 | 694356 | 0 | 0 | 1 | 2 | 0 | 1 | 8250 | 1 | 1 | 1 | 2 | 50 | 628 | 1 |
41 | 2 | 3 | 66 | 337 | 6085 | 1538.8677 | 1115917 | 0 | 1 | 9 | 2 | 0 | 1 | 8791 | 1 | 0 | 1 | 4 | 8 | 110 | 65 |
42 | 24 | 2 | 3 | 50 | 5703 | [null] | 1176551 | 0 | 0 | 2 | 1 | 0 | 1 | 4842 | 3 | 0 | 1 | 2 | 50 | 213 | 90 |
43 | 11 | 3 | 205 | 411 | 50761 | 2.0314 | 870907 | 0 | 0 | 9 | 1 | 0 | 1 | 12939 | 5 | 0 | 1 | 2 | 198 | 392 | 83 |
44 | 2 | 3 | 66 | 467 | 14721 | 106.7718 | 363772 | 0 | 0 | 9 | 3 | 1 | 1 | 12189 | 6 | 0 | 1 | 2 | 50 | 637 | 47 |
45 | 11 | 3 | 87 | 59 | 5152 | [null] | 663969 | 1 | 1 | 2 | 1 | 0 | 1 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 79 |
46 | 2 | 3 | 66 | 348 | 48862 | 5019.262 | 255366 | 0 | 0 | 9 | 1 | 0 | 1 | 16312 | 5 | 0 | 1 | 6 | 208 | 1418 | 67 |
47 | 2 | 3 | 66 | 356 | 32467 | [null] | 158074 | 0 | 1 | 2 | 2 | 0 | 1 | 11938 | 1 | 0 | 3 | 4 | 8 | 126 | 89 |
48 | 2 | 3 | 66 | 442 | 5875 | 105.3705 | 456626 | 0 | 0 | 0 | 2 | 2 | 1 | 12453 | 5 | 0 | 2 | 2 | 50 | 406 | 94 |
49 | 24 | 2 | 3 | 50 | 5703 | [null] | 1176551 | 0 | 0 | 2 | 1 | 0 | 1 | 4842 | 3 | 0 | 1 | 2 | 50 | 213 | 72 |
50 | 2 | 3 | 182 | 416 | 32100 | 5926.9012 | 260006 | 0 | 0 | 9 | 2 | 0 | 1 | 8740 | 1 | 0 | 1 | 6 | 105 | 12 | 25 |
51 | 2 | 3 | 48 | 23 | 14308 | [null] | 163518 | 0 | 0 | 9 | 2 | 0 | 2 | 8267 | 1 | 0 | 2 | 2 | 50 | 675 | 56 |
52 | 2 | 3 | 66 | 174 | 15081 | 6191.0227 | 352516 | 0 | 0 | 9 | 2 | 1 | 1 | 531 | 3 | 0 | 1 | 6 | 105 | 12 | 86 |
53 | 2 | 3 | 66 | 348 | 48862 | 3461.7653 | 776382 | 0 | 0 | 9 | 2 | 0 | 1 | 12210 | 6 | 1 | 1 | 6 | 70 | 19 | 25 |
54 | 2 | 3 | 66 | 462 | 17112 | 2245.662 | 233132 | 0 | 0 | 9 | 2 | 0 | 1 | 35597 | 4 | 0 | 1 | 2 | 50 | 502 | 77 |
55 | 2 | 3 | 66 | 226 | 42300 | 1431.3094 | 570235 | 0 | 1 | 9 | 2 | 0 | 1 | 11439 | 1 | 0 | 10 | 4 | 163 | 1503 | 65 |
56 | 2 | 3 | 66 | 348 | 12318 | [null] | 189555 | 0 | 0 | 9 | 2 | 3 | 1 | 24581 | 6 | 0 | 1 | 2 | 50 | 1378 | 91 |
57 | 2 | 3 | 182 | 363 | 53034 | 3009.7962 | 565982 | 0 | 0 | 9 | 4 | 2 | 1 | 8268 | 1 | 0 | 1 | 2 | 50 | 682 | 91 |
58 | 2 | 3 | 66 | 220 | 15257 | 332.5132 | 449990 | 0 | 0 | 9 | 1 | 0 | 1 | 42549 | 6 | 0 | 1 | 2 | 50 | 701 | 91 |
59 | 23 | 1 | 1 | 395 | 29254 | 890.9952 | 811835 | 0 | 1 | 9 | 2 | 2 | 1 | 53533 | 3 | 0 | 1 | 6 | 70 | 19 | 98 |
60 | 2 | 3 | 66 | 258 | 4687 | 1958.2962 | 511368 | 1 | 1 | 9 | 2 | 0 | 1 | 11439 | 1 | 0 | 2 | 4 | 163 | 1503 | 65 |
61 | 2 | 3 | 66 | 442 | 76 | 1957.7993 | 1089410 | 1 | 1 | 0 | 2 | 0 | 1 | 8816 | 1 | 0 | 1 | 4 | 163 | 1935 | 23 |
62 | 2 | 3 | 66 | 258 | 4687 | 762.9222 | 1147729 | 0 | 0 | 9 | 2 | 0 | 1 | 8793 | 1 | 0 | 1 | 2 | 50 | 677 | 73 |
63 | 2 | 3 | 133 | 20 | 46962 | [null] | 845483 | 0 | 0 | 9 | 2 | 0 | 1 | 21905 | 6 | 0 | 1 | 6 | 135 | 1557 | 48 |
64 | 2 | 3 | 66 | 447 | 902 | 367.7459 | 802983 | 0 | 0 | 9 | 2 | 0 | 1 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 88 |
65 | 11 | 3 | 205 | 330 | 34157 | 733.0887 | 86464 | 0 | 1 | 5 | 2 | 0 | 1 | 8287 | 1 | 0 | 1 | 2 | 198 | 397 | 55 |
66 | 2 | 3 | 66 | 220 | 9486 | 147.7409 | 1075092 | 0 | 0 | 9 | 2 | 2 | 1 | 11853 | 1 | 0 | 2 | 2 | 50 | 702 | 91 |
67 | 37 | 1 | 167 | 47 | 48840 | [null] | 529808 | 0 | 0 | 9 | 2 | 0 | 1 | 8818 | 1 | 0 | 1 | 3 | 99 | 88 | 20 |
68 | 2 | 3 | 66 | 351 | 16313 | 3052.6793 | 909565 | 0 | 0 | 9 | 1 | 0 | 1 | 512 | 1 | 0 | 1 | 2 | 50 | 711 | 42 |
69 | 2 | 3 | 66 | 442 | 35390 | 5029.2231 | 459308 | 0 | 1 | 9 | 2 | 0 | 1 | 8745 | 1 | 0 | 1 | 6 | 204 | 27 | 95 |
70 | 2 | 3 | 66 | 311 | 54525 | 1198.9608 | 1105418 | 0 | 0 | 9 | 2 | 0 | 1 | 4248 | 3 | 0 | 1 | 2 | 50 | 703 | 96 |
71 | 2 | 3 | 66 | 337 | 41380 | 9.5963 | 568989 | 0 | 0 | 9 | 2 | 0 | 1 | 5843 | 3 | 0 | 1 | 2 | 50 | 737 | 83 |
72 | 2 | 3 | 198 | 394 | 31116 | [null] | 1002738 | 1 | 1 | 9 | 2 | 1 | 1 | 8267 | 1 | 0 | 2 | 2 | 50 | 675 | 98 |
73 | 2 | 3 | 66 | 435 | 33527 | 99.5875 | 112651 | 0 | 0 | 0 | 1 | 0 | 1 | 12177 | 6 | 0 | 1 | 2 | 50 | 350 | 95 |
74 | 2 | 3 | 66 | 226 | 42300 | 1745.4772 | 329081 | 0 | 1 | 2 | 4 | 0 | 2 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 79 |
75 | 2 | 3 | 66 | 351 | 7929 | 105.2499 | 468892 | 0 | 0 | 9 | 1 | 0 | 1 | 27315 | 6 | 0 | 1 | 2 | 50 | 527 | 50 |
76 | 2 | 3 | 66 | 314 | 32759 | 237.914 | 751555 | 0 | 0 | 9 | 2 | 0 | 1 | 8274 | 1 | 0 | 1 | 2 | 50 | 684 | 33 |
77 | 2 | 3 | 66 | 318 | 43481 | 1770.9964 | 1020601 | 1 | 0 | 2 | 2 | 1 | 1 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 79 |
78 | 11 | 3 | 205 | 354 | 25315 | 444.9894 | 76259 | 0 | 0 | 2 | 2 | 0 | 1 | 12189 | 6 | 0 | 1 | 2 | 50 | 637 | 95 |
79 | 34 | 3 | 205 | 135 | 38749 | 1210.0346 | 772734 | 0 | 1 | 5 | 2 | 0 | 1 | 8250 | 1 | 0 | 3 | 2 | 50 | 628 | 88 |
80 | 2 | 3 | 66 | 333 | 6737 | 1496.2373 | 1117614 | 0 | 1 | 9 | 2 | 0 | 1 | 21059 | 6 | 0 | 2 | 4 | 8 | 110 | 96 |
81 | 2 | 3 | 23 | 48 | 4924 | [null] | 591027 | 0 | 0 | 9 | 1 | 0 | 1 | 12267 | 6 | 0 | 1 | 2 | 50 | 1230 | 5 |
82 | 2 | 3 | 66 | 174 | 26232 | 32.6626 | 361244 | 0 | 0 | 1 | 2 | 0 | 1 | 8835 | 1 | 0 | 1 | 2 | 50 | 357 | 73 |
83 | 2 | 3 | 66 | 435 | 16159 | 204.3453 | 561500 | 0 | 0 | 0 | 1 | 0 | 1 | 12177 | 6 | 0 | 1 | 2 | 50 | 350 | 13 |
84 | 2 | 3 | 66 | 258 | 4687 | 754.9028 | 1147729 | 0 | 0 | 9 | 2 | 0 | 1 | 8793 | 1 | 0 | 1 | 2 | 50 | 677 | 28 |
85 | 34 | 3 | 205 | 339 | 10935 | 3225.5579 | 497468 | 0 | 1 | 5 | 2 | 0 | 1 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 90 |
86 | 2 | 3 | 66 | 143 | 38121 | 268.2817 | 273265 | 0 | 0 | 9 | 2 | 0 | 1 | 12228 | 6 | 0 | 1 | 2 | 50 | 411 | 98 |
87 | 40 | 3 | 66 | 351 | 7929 | 412.9066 | 875633 | 0 | 0 | 9 | 1 | 1 | 1 | 25750 | 6 | 1 | 1 | 2 | 50 | 1507 | 32 |
88 | 2 | 3 | 23 | 48 | 4924 | [null] | 591027 | 0 | 0 | 9 | 2 | 0 | 1 | 12267 | 6 | 0 | 1 | 2 | 50 | 1230 | 95 |
89 | 34 | 3 | 205 | 135 | 14291 | 1846.3559 | 294858 | 0 | 0 | 9 | 3 | 0 | 2 | 12820 | 5 | 0 | 2 | 2 | 50 | 644 | 48 |
90 | 2 | 3 | 66 | 348 | 47997 | 3813.3457 | 1093118 | 0 | 0 | 0 | 2 | 0 | 1 | 8800 | 1 | 0 | 3 | 6 | 144 | 15 | 59 |
91 | 2 | 3 | 66 | 196 | 2428 | 5221.6126 | 111930 | 0 | 0 | 9 | 2 | 0 | 1 | 8225 | 1 | 0 | 1 | 4 | 124 | 1939 | 38 |
92 | 11 | 3 | 87 | 59 | 5152 | [null] | 663969 | 1 | 1 | 2 | 1 | 0 | 1 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 51 |
93 | 34 | 3 | 205 | 339 | 10935 | 3223.2475 | 497468 | 0 | 1 | 5 | 2 | 0 | 1 | 8250 | 1 | 0 | 1 | 2 | 50 | 628 | 79 |
94 | 2 | 3 | 66 | 220 | 2086 | 1098.0998 | 562773 | 0 | 1 | 9 | 2 | 0 | 1 | 8267 | 1 | 0 | 1 | 2 | 50 | 675 | 37 |
95 | 2 | 3 | 66 | 348 | 47997 | 1638.6578 | 899428 | 1 | 0 | 4 | 2 | 0 | 1 | 1760 | 1 | 0 | 1 | 4 | 51 | 197 | 84 |
96 | 2 | 3 | 66 | 442 | 19605 | 3.0158 | 31401 | 0 | 0 | 7 | 2 | 0 | 1 | 43002 | 1 | 0 | 1 | 2 | 50 | 530 | 9 |
97 | 2 | 3 | 66 | 288 | 28562 | 103.2738 | 823012 | 1 | 0 | 9 | 2 | 0 | 1 | 26740 | 6 | 0 | 2 | 2 | 50 | 192 | 39 |
98 | 2 | 3 | 66 | 311 | 4779 | 1657.8031 | 536444 | 0 | 0 | 9 | 1 | 4 | 7 | 11439 | 1 | 0 | 1 | 4 | 163 | 1503 | 52 |
99 | 2 | 3 | 154 | 196 | 38140 | [null] | 590067 | 0 | 0 | 8 | 2 | 0 | 1 | 565 | 1 | 0 | 1 | 4 | 8 | 113 | 10 |
100 | 2 | 3 | 66 | 311 | 9547 | 177.2899 | 1034727 | 0 | 0 | 9 | 3 | 0 | 1 | 45626 | 6 | 0 | 4 | 2 | 50 | 675 | 83 |
[12]:
start_time = time.time()
expedia.corr(show = False)
print(f"elapsed time = {time.time() - start_time}")
elapsed time = 23.69867491722107
In just over 20 seconds, VerticaPy computes the correlation matrix on the entire dataset. Compared to the 20 seconds it took to compute the correlation matrix on half the dataset in-memory, this is huge improvement. Using other Vertica-specific features, such as projections, we could further enhance this performance.
VerticaPy also caches the computed aggregations. With this cache available, we can repeat the correlation matrix computation almost instantaneously:
NOTE: If necessary, you can deactivate the cache by calling the set_option() function with the
cache
parameter set to False.
[13]:
start_time = time.time()
expedia.corr(show = False)
print(f"elapsed time = {time.time() - start_time}")
100%|██████████| 20/20 [00:00<00:00, 800.07it/s]
elapsed time = 0.31166625022888184
Memory usage in-memory vs. in-database#
Now, we will examine how the memory usage compares between in-memory and in-database.
First, use the pandas info()
method to explore the DataFrame’s memory usage:
[14]:
expedia_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000000 entries, 0 to 19999999
Data columns (total 21 columns):
# Column Dtype
--- ------ -----
0 site_name int64
1 posa_continent int64
2 user_location_country int64
3 user_location_region int64
4 user_location_city int64
5 orig_destination_distance float64
6 user_id int64
7 is_mobile int64
8 is_package int64
9 channel int64
10 srch_adults_cnt int64
11 srch_children_cnt int64
12 srch_rm_cnt int64
13 srch_destination_id int64
14 srch_destination_type_id int64
15 is_booking int64
16 cnt int64
17 hotel_continent int64
18 hotel_country int64
19 hotel_market int64
20 hotel_cluster int64
dtypes: float64(1), int64(20)
memory usage: 3.1 GB
The DataFrame is using 3.1GB to store half the Expedia dataset. Use the memory_usage()
method to list the expedia
vDataFrame’s memory usage:
[15]:
expedia.memory_usage()
[15]:
value | |
object | 1039 |
"site_name" | 1727 |
"posa_continent" | 1732 |
"user_location_country" | 1739 |
"user_location_region" | 1738 |
"user_location_city" | 1736 |
"orig_destination_distance" | 1743 |
"user_id" | 1725 |
"is_mobile" | 1727 |
"is_package" | 1728 |
"channel" | 1725 |
"srch_adults_cnt" | 1733 |
"srch_children_cnt" | 1735 |
"srch_rm_cnt" | 1729 |
"srch_destination_id" | 1737 |
"srch_destination_type_id" | 1742 |
"is_booking" | 1728 |
"cnt" | 1721 |
"hotel_continent" | 1733 |
"hotel_country" | 1731 |
"hotel_market" | 1730 |
"hotel_cluster" | 1731 |
total | 37409 |
The vDataFrame only uses about 37KB! By storing the data in the Vertica database, and only recording the user’s data modifications in memory, the memory usage is reduced to a minimum.
With VerticaPy, we can take advantage of Vertica’s structure and scalability, providing fast queries without ever loading the data into memory. In the above examples, we’ve seen that in-memory processing is much more expensive in both computation and memory usage. This often leads to the decesion to downsample the data, which sacrfices the possibility of further data insights.
The vDataFrame structure#
Now that we’ve seen the performance and memory benefits of the vDataFrame, let’s dig into some of the underlying structures and methods that produce these great results.
vDataFrames are composed of columns called vDataColumns. To view all vDataColumns in a vDataFrame, use the get_columns()
method:
[16]:
expedia.get_columns()
[16]:
['"site_name"',
'"posa_continent"',
'"user_location_country"',
'"user_location_region"',
'"user_location_city"',
'"orig_destination_distance"',
'"user_id"',
'"is_mobile"',
'"is_package"',
'"channel"',
'"srch_adults_cnt"',
'"srch_children_cnt"',
'"srch_rm_cnt"',
'"srch_destination_id"',
'"srch_destination_type_id"',
'"is_booking"',
'"cnt"',
'"hotel_continent"',
'"hotel_country"',
'"hotel_market"',
'"hotel_cluster"']
To access a vDataColumn, specify the column name in square brackets, for example:
NOTE: VerticaPy saves computed aggregations to avoid unncessary recomputations.
[17]:
expedia["is_booking"].describe()
[17]:
value | |
name | "is_booking" |
dtype | int |
unique | 2.0 |
count | 37670293.0 |
0 | 34669600 |
1 | 3000693 |
Each vDataColumn has its own catalog to save user modifications. In the previous example, we computed some aggregations for the is_booking
column. Let’s look at the catalog for that vDataColumn:
[18]:
expedia["is_booking"]._catalog
[18]:
{'cov': {},
'pearson': {'"site_name"': -0.0103791330144224,
'"posa_continent"': 0.00972436717006096,
'"user_location_country"': 0.00752614887257535,
'"user_location_region"': 0.00635166380501374,
'"user_location_city"': 0.00222760096536968,
'"orig_destination_distance"': -0.0377320625613809,
'"user_id"': 0.00182006552590588,
'"is_mobile"': -0.0307536571387009,
'"is_package"': -0.0763467336220978,
'"channel"': 0.0244378703859355,
'"srch_adults_cnt"': -0.0490450028206689,
'"srch_children_cnt"': -0.0222719630493467,
'"srch_rm_cnt"': 0.0108395233625253,
'"srch_destination_id"': 0.0249567593022902,
'"srch_destination_type_id"': 0.0404728725370959,
'"is_booking"': 1.0,
'"cnt"': -0.112906513841061,
'"hotel_continent"': -0.0261279829743103,
'"hotel_country"': -0.00394808105853647,
'"hotel_market"': 0.0119576115952104,
'"hotel_cluster"': -0.021548065656294},
'spearman': {},
'spearmand': {},
'kendall': {},
'cramer': {},
'biserial': {},
'regr_avgx': {},
'regr_avgy': {},
'regr_count': {},
'regr_intercept': {},
'regr_r2': {},
'regr_slope': {},
'regr_sxx': {},
'regr_sxy': {},
'regr_syy': {},
'approx_unique': 2,
'count': 37670293}
The catalog is updated whenever major changes are made to the data.
We can also view the vDataFrame’s backend SQL code generation by setting the sql_on
parameter to True with the set_option()
function:
[19]:
vp.set_option("sql_on", True)
expedia["cnt"].describe()
Computing the different aggregations.
/*+LABEL('vDataframe.aggregate')*/ APPROXIMATE_COUNT_DISTINCT("cnt")
FROM
(
SELECT
"site_name",
"posa_continent",
"user_location_country",
"user_location_region",
"user_location_city",
"orig_destination_distance",
"user_id",
"is_mobile",
"is_package",
"channel",
"srch_adults_cnt",
"srch_children_cnt",
"srch_rm_cnt",
"srch_destination_id",
"srch_destination_type_id",
"is_booking",
"cnt",
"hotel_continent",
"hotel_country",
"hotel_market",
"hotel_cluster"
FROM
"public"."train")
VERTICAPY_SUBTABLE
LIMIT 1
Computing the descriptive statistics of all numerical columns using SUMMARIZE_NUMCOL.
/*+LABEL('vDataframe.describe')*/ SUMMARIZE_NUMCOL("cnt") OVER ()
FROM
(
SELECT
"site_name",
"posa_continent",
"user_location_country",
"user_location_region",
"user_location_city",
"orig_destination_distance",
"user_id",
"is_mobile",
"is_package",
"channel",
"srch_adults_cnt",
"srch_children_cnt",
"srch_rm_cnt",
"srch_destination_id",
"srch_destination_type_id",
"is_booking",
"cnt",
"hotel_continent",
"hotel_country",
"hotel_market",
"hotel_cluster"
FROM
"public"."train")
VERTICAPY_SUBTABLE
[19]:
value | |
name | "cnt" |
dtype | int |
unique | 103.0 |
count | 37670293 |
mean | 1.48338392271067 |
std | 1.21977557865588 |
min | 1.0 |
approx_25% | 1.0 |
approx_50% | 1.0 |
approx_75% | 2.0 |
max | 269.0 |
To control whether each query outputs its elasped time, use the time_on
parameter of the set_option()
function:
NOTE: To display matplotlib graphics in Jupyter, you must use the ‘%matplotlib inline’ command the first time you draw a graphic.
[20]:
expedia = vp.vDataFrame("public.train") # creating a new vDataFrame to delete the catalog
vp.set_option("time_on", True)
expedia.corr()
Checking if the table is a flextable.
is_flextable
FROM
v_catalog.tables
WHERE table_name = 'train' AND table_schema = 'public' AND is_flextable
LIMIT 1
Getting the data types.
/*+LABEL('get_data_types')*/ column_name,
data_type
FROM
((
SELECT
column_name,
data_type,
ordinal_position
FROM
columns
WHERE table_name = 'train' AND table_schema = 'public') UNION (
SELECT
column_name,
data_type,
ordinal_position
FROM
view_columns
WHERE table_name = 'train' AND table_schema = 'public')) x
ORDER BY ordinal_position
Computing the pearson Corr Matrix.
/*+LABEL('vDataframe._aggregate_matrix')*/ CORR_MATRIX("site_name", "posa_continent", "user_location_country", "user_location_region", "user_location_city", "orig_destination_distance", "user_id", "is_mobile", "is_package", "channel", "srch_adults_cnt", "srch_children_cnt", "srch_rm_cnt", "srch_destination_id", "srch_destination_type_id", "is_booking", "cnt", "hotel_continent", "hotel_country", "hotel_market", "hotel_cluster") OVER ()
FROM
"public"."train"
Data type cannot be displayed: application/vnd.plotly.v1+json
The aggregation’s for each vDataColumn are saved to its catalog. If we again call the corr()
method, it’ll complete in a couple seconds—the time needed to draw the graphic—because the aggregations have already been computed and saved during the last call:
[22]:
start_time = time.time()
expedia.corr()
print("elapsed time = {}".format(time.time() - start_time))
Computing the pearson Corr Matrix.
/*+LABEL('vDataframe._aggregate_matrix')*/ CORR_MATRIX("site_name", "posa_continent", "user_location_country", "user_location_region", "user_location_city", "orig_destination_distance", "user_id", "is_mobile", "is_package", "channel", "srch_adults_cnt", "srch_children_cnt", "srch_rm_cnt", "srch_destination_id", "srch_destination_type_id", "is_booking", "cnt", "hotel_continent", "hotel_country", "hotel_market", "hotel_cluster") OVER ()
FROM
"public"."train"
elapsed time = 23.79699158668518
To turn off the elapsed time and the SQL code generation options:
[23]:
vp.set_option("sql_on", False)
vp.set_option("time_on", False)
You can obtain the current vDataFrame relation with the current_relation()
method:
[24]:
print(expedia.current_relation())
"public"."train"
The generated SQL for the relation changes according to the user’s modifications. For example, if we impute the missing values of the orig_destination_distance
vDataColumn by its average and then drop the is_package
vDataColumn, these changes are reflected in the relation:
[25]:
expedia["orig_destination_distance"].fillna(method = "avg")
expedia["is_package"].drop()
print(expedia.current_relation())
13525001 elements were filled.
(
SELECT
"date_time",
"site_name",
"posa_continent",
"user_location_country",
"user_location_region",
"user_location_city",
COALESCE("orig_destination_distance", 1970.0900267207) AS "orig_destination_distance",
"user_id",
"is_mobile",
"channel",
"srch_ci",
"srch_co",
"srch_adults_cnt",
"srch_children_cnt",
"srch_rm_cnt",
"srch_destination_id",
"srch_destination_type_id",
"is_booking",
"cnt",
"hotel_continent",
"hotel_country",
"hotel_market",
"hotel_cluster"
FROM
(
SELECT
"date_time",
"site_name",
"posa_continent",
"user_location_country",
"user_location_region",
"user_location_city",
"orig_destination_distance",
"user_id",
"is_mobile",
"channel",
"srch_ci",
"srch_co",
"srch_adults_cnt",
"srch_children_cnt",
"srch_rm_cnt",
"srch_destination_id",
"srch_destination_type_id",
"is_booking",
"cnt",
"hotel_continent",
"hotel_country",
"hotel_market",
"hotel_cluster"
FROM
"public"."train")
VERTICAPY_SUBTABLE)
VERTICAPY_SUBTABLE
Notice that the is_package
column has been removed from the SELECT statement and the orig_destination_distance
is now using a COALESCE SQL function.
vDataFrame attributes and management#
The vDataFrame has many attributes and methods, some of which were demonstrated in the above examples. vDataFrames have two types of attributes:
Virtual Columns (vDataColumn)
Main attributes (columns, main_relation …)
The vDataFrame’s main attributes are stored in the _vars
dictionary:
NOTE: You should never change these attributes manually.
[26]:
expedia._vars
[26]:
{'allcols_ind': 24,
'count': 37670293,
'clean_query': True,
'exclude_columns': [],
'history': ['{Wed Jan 10 14:39:26 2024} [Fillna]: 13525001 "orig_destination_distance" missing values were filled.',
'{Wed Jan 10 14:39:26 2024} [Drop]: vDataColumn "is_package" was deleted from the vDataFrame.'],
'isflex': False,
'max_columns': -1,
'max_rows': -1,
'order_by': {},
'saving': [],
'sql_push_ext': False,
'sql_magic_result': 0,
'symbol': '$',
'where': [],
'has_dpnames': False,
'columns': ['"date_time"',
'"site_name"',
'"posa_continent"',
'"user_location_country"',
'"user_location_region"',
'"user_location_city"',
'"orig_destination_distance"',
'"user_id"',
'"is_mobile"',
'"channel"',
'"srch_ci"',
'"srch_co"',
'"srch_adults_cnt"',
'"srch_children_cnt"',
'"srch_rm_cnt"',
'"srch_destination_id"',
'"srch_destination_type_id"',
'"is_booking"',
'"cnt"',
'"hotel_continent"',
'"hotel_country"',
'"hotel_market"',
'"hotel_cluster"'],
'main_relation': '"public"."train"'}
vDataFrame data types#
vDataFrames use the data types of its vDataColumns. The behavior of some vDataFrame methods depend on the data type of the columns. For example, computing a histogram for a numerical data type is not the same as computing a histogram for a categorical data type.
The vDataFrame identifies four main data types: - int
: integers are treated like categorical data types when their cardinality is low; otherwise, they are considered numeric - float
: numeric data types - date
: date-like data types (including timestamp) - text
: categorical data types
Data types not included in the above list are automatically treated as categorical. You can examine the data types of the vDataColumns in a vDataFrame using the dtypes()
method:
[27]:
expedia.dtypes()
[27]:
dtype | |
"date_time" | timestamp |
"site_name" | int |
"posa_continent" | int |
"user_location_country" | int |
"user_location_region" | int |
"user_location_city" | int |
"orig_destination_distance" | float |
"user_id" | int |
"is_mobile" | int |
"channel" | int |
"srch_ci" | date |
"srch_co" | date |
"srch_adults_cnt" | int |
"srch_children_cnt" | int |
"srch_rm_cnt" | int |
"srch_destination_id" | int |
"srch_destination_type_id" | int |
"is_booking" | int |
"cnt" | int |
"hotel_continent" | int |
"hotel_country" | int |
"hotel_market" | int |
"hotel_cluster" | int |
To convert the data type of a vDataColumn, use the astype()
method:
[28]:
expedia["hotel_market"].astype("varchar")
expedia["hotel_market"].ctype()
[28]:
'varchar'
To view the category of a specific vDataColumn, specify the vDataColumn and use the category()
method:
[29]:
expedia["hotel_market"].category()
[29]:
'text'
Exporting, saving, and loading vDataFrames#
The save()
and load()
functions allow you to save and load vDataFrames:
[30]:
expedia.save()
expedia.filter("is_booking = 1")
34669600 elements were filtered.
[30]:
📅 date_timeTimestamp(29) | 123 site_nameInteger | 123 posa_continentInteger | 123 user_location_countryInteger | 123 user_location_regionInteger | 123 user_location_cityInteger | 123 orig_destination_distanceNumeric(19) | 123 user_idInteger | 123 is_mobileInteger | 123 channelInteger | 📅 srch_ciDate | 📅 srch_coDate | 123 srch_adults_cntInteger | 123 srch_children_cntInteger | 123 srch_rm_cntInteger | 123 srch_destination_idInteger | 123 srch_destination_type_idInteger | 123 is_bookingInteger | 123 cntInteger | 123 hotel_continentInteger | 123 hotel_countryInteger | Abc hotel_marketVarchar(20) | 123 hotel_clusterInteger | |
1 | 2013-01-07 00:41:02 | 24 | 2 | 3 | 51 | 9527 | 1970.0900267207 | 259991 | 0 | 1 | 2013-05-05 | 2013-05-06 | 1 | 0 | 1 | 23507 | 6 | 1 | 1 | 6 | 70 | 19 | 9 |
2 | 2013-01-07 00:59:56 | 24 | 2 | 3 | 50 | 5703 | 1970.0900267207 | 486892 | 0 | 3 | 2013-02-08 | 2013-02-10 | 2 | 1 | 1 | 8746 | 1 | 1 | 1 | 6 | 105 | 29 | 43 |
3 | 2013-01-07 01:14:19 | 2 | 3 | 35 | 177 | 33543 | 2650.4725 | 119414 | 0 | 9 | 2013-01-17 | 2013-01-19 | 1 | 0 | 1 | 18788 | 1 | 1 | 1 | 5 | 203 | 253 | 61 |
4 | 2013-01-07 01:26:37 | 2 | 3 | 229 | 407 | 16373 | 1970.0900267207 | 217610 | 0 | 9 | 2013-01-08 | 2013-01-11 | 1 | 0 | 1 | 12014 | 1 | 1 | 1 | 2 | 50 | 644 | 15 |
5 | 2013-01-07 01:27:19 | 2 | 3 | 66 | 174 | 14752 | 342.9899 | 1043452 | 0 | 9 | 2013-03-01 | 2013-03-02 | 2 | 0 | 1 | 8279 | 1 | 1 | 1 | 2 | 50 | 1230 | 83 |
6 | 2013-01-07 02:09:57 | 2 | 3 | 235 | 13 | 53632 | 1970.0900267207 | 627910 | 0 | 9 | 2013-01-17 | 2013-01-18 | 1 | 0 | 1 | 15655 | 5 | 1 | 1 | 3 | 1 | 142 | 38 |
7 | 2013-01-07 03:26:29 | 2 | 3 | 66 | 348 | 48862 | 1100.7225 | 961860 | 0 | 9 | 2013-02-14 | 2013-02-20 | 2 | 2 | 2 | 11373 | 1 | 1 | 1 | 4 | 128 | 1455 | 69 |
8 | 2013-01-07 04:17:26 | 2 | 3 | 0 | 317 | 56136 | 7872.7567 | 438592 | 0 | 9 | 2013-01-23 | 2013-01-25 | 2 | 1 | 1 | 8273 | 1 | 1 | 1 | 2 | 50 | 660 | 94 |
9 | 2013-01-07 04:39:20 | 2 | 3 | 50 | 706 | 7736 | 1970.0900267207 | 1093463 | 0 | 1 | 2013-03-29 | 2013-03-30 | 2 | 0 | 1 | 952 | 3 | 1 | 1 | 3 | 89 | 811 | 67 |
10 | 2013-01-07 04:52:59 | 2 | 3 | 66 | 254 | 45890 | 143.3024 | 1164465 | 0 | 0 | 2013-09-14 | 2013-09-15 | 1 | 0 | 5 | 27671 | 1 | 1 | 1 | 2 | 50 | 424 | 50 |
11 | 2013-01-07 05:00:13 | 2 | 3 | 0 | 317 | 56136 | 4490.0419 | 438592 | 0 | 9 | 2013-01-12 | 2013-01-13 | 1 | 1 | 1 | 23507 | 6 | 1 | 1 | 6 | 70 | 19 | 6 |
12 | 2013-01-07 05:13:32 | 2 | 3 | 190 | 133 | 27383 | 1970.0900267207 | 857247 | 0 | 9 | 2013-01-29 | 2013-02-01 | 2 | 1 | 1 | 2710 | 3 | 1 | 1 | 6 | 144 | 1013 | 20 |
13 | 2013-01-07 05:14:37 | 2 | 3 | 66 | 189 | 45484 | 1970.0900267207 | 1186146 | 0 | 9 | 2013-01-14 | 2013-01-15 | 2 | 0 | 2 | 25335 | 6 | 1 | 1 | 2 | 50 | 191 | 68 |
14 | 2013-01-07 05:57:58 | 2 | 3 | 66 | 220 | 56137 | 211.5138 | 111728 | 0 | 9 | 2013-01-25 | 2013-01-26 | 2 | 0 | 1 | 15093 | 1 | 1 | 1 | 2 | 50 | 679 | 31 |
15 | 2013-01-07 06:15:01 | 2 | 3 | 66 | 348 | 48862 | 1.0905 | 106903 | 0 | 0 | 2013-02-04 | 2013-02-07 | 1 | 1 | 1 | 8267 | 1 | 1 | 1 | 2 | 50 | 675 | 37 |
16 | 2013-01-07 06:15:19 | 30 | 4 | 195 | 991 | 22648 | 1970.0900267207 | 992548 | 0 | 9 | 2013-01-16 | 2013-01-19 | 2 | 0 | 1 | 22258 | 6 | 1 | 1 | 0 | 34 | 43 | 99 |
17 | 2013-01-07 06:15:42 | 34 | 3 | 46 | 347 | 30537 | 108.3081 | 871047 | 0 | 9 | 2013-01-08 | 2013-01-09 | 1 | 0 | 1 | 21909 | 6 | 1 | 1 | 6 | 135 | 281 | 67 |
18 | 2013-01-07 06:31:55 | 2 | 3 | 66 | 335 | 53542 | 1866.5732 | 229324 | 0 | 9 | 2013-01-15 | 2013-01-22 | 2 | 0 | 1 | 9171 | 1 | 1 | 1 | 4 | 118 | 200 | 87 |
19 | 2013-01-07 06:32:55 | 2 | 3 | 66 | 196 | 2428 | 593.9359 | 144229 | 0 | 2 | 2013-03-06 | 2013-03-08 | 1 | 0 | 1 | 8230 | 1 | 1 | 1 | 2 | 50 | 637 | 32 |
20 | 2013-01-07 06:37:35 | 2 | 3 | 231 | 68 | 42296 | 1970.0900267207 | 433110 | 0 | 9 | 2013-01-08 | 2013-01-13 | 1 | 0 | 1 | 7146 | 3 | 1 | 1 | 2 | 50 | 663 | 48 |
21 | 2013-01-07 06:43:12 | 11 | 3 | 205 | 330 | 12163 | 2091.0945 | 422818 | 0 | 0 | 2013-02-09 | 2013-02-19 | 2 | 1 | 1 | 12188 | 6 | 1 | 1 | 4 | 8 | 110 | 73 |
22 | 2013-01-07 06:47:38 | 2 | 3 | 66 | 337 | 47407 | 1970.0900267207 | 328666 | 0 | 9 | 2013-03-02 | 2013-03-05 | 3 | 1 | 1 | 8740 | 1 | 1 | 1 | 6 | 105 | 12 | 7 |
23 | 2013-01-07 06:53:42 | 24 | 2 | 3 | 50 | 22013 | 1970.0900267207 | 4166 | 0 | 1 | 2013-01-15 | 2013-01-17 | 2 | 0 | 2 | 8818 | 1 | 1 | 1 | 3 | 99 | 88 | 82 |
24 | 2013-01-07 06:57:01 | 2 | 3 | 66 | 311 | 4779 | 245.5375 | 798321 | 0 | 9 | 2013-01-11 | 2013-01-14 | 2 | 0 | 1 | 26227 | 6 | 1 | 1 | 2 | 198 | 384 | 55 |
25 | 2013-01-07 07:00:35 | 11 | 3 | 205 | 385 | 34274 | 1970.0900267207 | 446411 | 0 | 4 | 2013-01-10 | 2013-01-14 | 2 | 0 | 2 | 3262 | 3 | 1 | 1 | 2 | 198 | 937 | 7 |
26 | 2013-01-07 07:12:36 | 2 | 3 | 66 | 174 | 14752 | 468.8397 | 1043452 | 0 | 0 | 2013-02-25 | 2013-02-26 | 2 | 0 | 1 | 8833 | 1 | 1 | 1 | 2 | 50 | 1012 | 93 |
27 | 2013-01-07 07:18:53 | 2 | 3 | 179 | 18 | 30570 | 1970.0900267207 | 801328 | 0 | 9 | 2013-01-16 | 2013-01-17 | 2 | 1 | 1 | 26964 | 1 | 1 | 1 | 2 | 50 | 410 | 16 |
28 | 2013-01-07 07:29:56 | 2 | 3 | 66 | 293 | 34807 | 99.6924 | 135775 | 0 | 9 | 2013-01-21 | 2013-01-24 | 1 | 1 | 1 | 12364 | 5 | 1 | 1 | 2 | 50 | 647 | 21 |
29 | 2013-01-07 07:32:03 | 2 | 3 | 80 | 39 | 16563 | 1970.0900267207 | 609648 | 0 | 2 | 2013-01-25 | 2013-01-28 | 1 | 0 | 1 | 8213 | 1 | 1 | 1 | 6 | 68 | 275 | 83 |
30 | 2013-01-07 07:32:54 | 2 | 3 | 66 | 258 | 4687 | 826.8573 | 391512 | 0 | 9 | 2013-02-12 | 2013-02-16 | 1 | 0 | 1 | 8239 | 1 | 1 | 1 | 2 | 50 | 407 | 93 |
31 | 2013-01-07 07:37:48 | 11 | 3 | 205 | 354 | 48951 | 3517.9097 | 319264 | 0 | 4 | 2013-04-14 | 2013-04-16 | 2 | 0 | 1 | 12731 | 5 | 1 | 1 | 6 | 170 | 773 | 36 |
32 | 2013-01-07 07:38:25 | 11 | 3 | 205 | 385 | 46963 | 0.4382 | 1069720 | 0 | 4 | 2013-01-18 | 2013-01-19 | 1 | 0 | 1 | 8262 | 1 | 1 | 1 | 2 | 198 | 384 | 95 |
33 | 2013-01-07 07:39:17 | 8 | 4 | 77 | 824 | 23566 | 451.4598 | 965461 | 0 | 9 | 2013-02-14 | 2013-02-15 | 2 | 0 | 2 | 22553 | 6 | 1 | 1 | 0 | 63 | 962 | 2 |
34 | 2013-01-07 07:45:04 | 2 | 3 | 66 | 260 | 30771 | 396.0527 | 555580 | 0 | 9 | 2013-02-21 | 2013-02-24 | 1 | 0 | 1 | 14811 | 1 | 1 | 1 | 2 | 50 | 512 | 7 |
35 | 2013-01-07 07:48:53 | 2 | 3 | 133 | 26 | 49465 | 1970.0900267207 | 665776 | 0 | 9 | 2013-01-10 | 2013-01-11 | 2 | 2 | 1 | 22616 | 6 | 1 | 1 | 6 | 204 | 1452 | 75 |
36 | 2013-01-07 07:51:35 | 11 | 3 | 205 | 385 | 34274 | 1970.0900267207 | 446411 | 0 | 4 | 2013-01-14 | 2013-01-15 | 1 | 0 | 1 | 12942 | 5 | 1 | 1 | 2 | 198 | 384 | 50 |
37 | 2013-01-07 08:01:50 | 2 | 3 | 66 | 174 | 16878 | 6482.2277 | 1076786 | 0 | 9 | 2013-03-02 | 2013-03-09 | 1 | 1 | 1 | 8217 | 1 | 1 | 1 | 0 | 34 | 43 | 85 |
38 | 2013-01-07 08:04:47 | 2 | 3 | 66 | 442 | 76 | 1479.376 | 119642 | 0 | 9 | 2013-02-04 | 2013-02-07 | 1 | 0 | 1 | 12269 | 6 | 1 | 1 | 2 | 50 | 1230 | 50 |
39 | 2013-01-07 08:05:51 | 2 | 3 | 66 | 220 | 2511 | 2176.807 | 185301 | 1 | 9 | 2013-02-13 | 2013-02-16 | 2 | 0 | 1 | 12206 | 6 | 1 | 1 | 2 | 50 | 628 | 45 |
40 | 2013-01-07 08:08:01 | 37 | 1 | 69 | 959 | 20186 | 1970.0900267207 | 10762 | 0 | 9 | 2013-04-05 | 2013-04-17 | 4 | 1 | 2 | 8818 | 1 | 1 | 1 | 3 | 99 | 88 | 46 |
41 | 2013-01-07 08:11:49 | 2 | 3 | 66 | 258 | 9875 | 411.3601 | 766627 | 0 | 9 | 2013-01-10 | 2013-01-12 | 3 | 0 | 1 | 26740 | 6 | 1 | 1 | 2 | 50 | 192 | 28 |
42 | 2013-01-07 08:14:58 | 2 | 3 | 66 | 184 | 11878 | 826.1879 | 448154 | 1 | 9 | 2013-01-08 | 2013-01-10 | 2 | 2 | 1 | 12603 | 5 | 1 | 1 | 2 | 50 | 365 | 7 |
43 | 2013-01-07 08:15:10 | 2 | 3 | 66 | 331 | 9189 | 225.4191 | 18167 | 0 | 9 | 2013-01-17 | 2013-01-18 | 1 | 0 | 1 | 11928 | 6 | 1 | 1 | 2 | 50 | 191 | 7 |
44 | 2013-01-07 08:21:23 | 2 | 3 | 66 | 220 | 4948 | 647.0217 | 656567 | 0 | 9 | 2013-01-08 | 2013-01-09 | 1 | 0 | 1 | 12016 | 1 | 1 | 1 | 2 | 50 | 741 | 94 |
45 | 2013-01-07 08:22:05 | 11 | 3 | 205 | 354 | 33452 | 1284.8882 | 716411 | 0 | 9 | 2013-02-07 | 2013-02-08 | 1 | 1 | 1 | 8260 | 1 | 1 | 1 | 2 | 50 | 701 | 77 |
46 | 2013-01-07 08:24:50 | 2 | 3 | 66 | 321 | 9101 | 1397.5619 | 82641 | 0 | 1 | 2013-04-28 | 2013-05-02 | 2 | 0 | 1 | 8250 | 1 | 1 | 1 | 2 | 50 | 628 | 79 |
47 | 2013-01-07 08:29:49 | 2 | 3 | 66 | 315 | 33371 | 106.0414 | 466806 | 0 | 9 | 2013-02-01 | 2013-02-02 | 4 | 0 | 2 | 9524 | 1 | 1 | 1 | 2 | 50 | 561 | 42 |
48 | 2013-01-07 08:49:22 | 2 | 3 | 66 | 184 | 54612 | 110.1167 | 1053407 | 0 | 9 | 2013-01-07 | 2013-01-08 | 1 | 0 | 1 | 12452 | 5 | 1 | 1 | 2 | 50 | 656 | 28 |
49 | 2013-01-07 08:50:12 | 2 | 3 | 66 | 351 | 7929 | 966.7501 | 554890 | 0 | 9 | 2013-03-01 | 2013-03-05 | 2 | 0 | 1 | 8260 | 1 | 1 | 1 | 2 | 50 | 701 | 17 |
50 | 2013-01-07 08:51:57 | 2 | 3 | 66 | 363 | 26006 | 2381.5825 | 988395 | 0 | 2 | 2013-01-23 | 2013-01-28 | 0 | 0 | 1 | 12132 | 1 | 1 | 1 | 2 | 50 | 366 | 48 |
51 | 2013-01-07 08:52:22 | 2 | 3 | 66 | 448 | 44417 | 97.0171 | 887225 | 0 | 9 | 2013-01-18 | 2013-01-20 | 2 | 1 | 2 | 11358 | 1 | 1 | 1 | 2 | 50 | 570 | 6 |
52 | 2013-01-07 09:07:47 | 2 | 3 | 66 | 348 | 18487 | 4.7465 | 183665 | 0 | 9 | 2013-01-11 | 2013-01-13 | 1 | 0 | 1 | 24799 | 6 | 1 | 1 | 2 | 50 | 609 | 72 |
53 | 2013-01-07 09:19:41 | 37 | 1 | 69 | 900 | 13439 | 1970.0900267207 | 749006 | 1 | 4 | 2013-01-07 | 2013-01-10 | 1 | 0 | 1 | 2210 | 3 | 1 | 1 | 6 | 70 | 1915 | 30 |
54 | 2013-01-07 09:23:31 | 2 | 3 | 66 | 356 | 12482 | 618.1144 | 752237 | 0 | 2 | 2013-04-19 | 2013-04-25 | 2 | 0 | 1 | 12175 | 6 | 1 | 1 | 2 | 50 | 366 | 31 |
55 | 2013-01-07 09:26:44 | 2 | 3 | 66 | 258 | 20843 | 889.6633 | 657617 | 0 | 0 | 2013-01-13 | 2013-01-16 | 2 | 0 | 1 | 12196 | 6 | 1 | 1 | 2 | 50 | 656 | 91 |
56 | 2013-01-07 09:27:53 | 2 | 3 | 66 | 184 | 27951 | 94.2886 | 814964 | 0 | 9 | 2013-01-09 | 2013-01-10 | 2 | 2 | 1 | 12196 | 6 | 1 | 1 | 2 | 50 | 656 | 28 |
57 | 2013-01-07 09:30:17 | 2 | 3 | 46 | 157 | 34868 | 1.9242 | 1088771 | 0 | 9 | 2013-01-09 | 2013-01-11 | 2 | 0 | 1 | 8437 | 6 | 1 | 1 | 6 | 144 | 4 | 15 |
58 | 2013-01-07 09:30:32 | 2 | 3 | 66 | 256 | 53274 | 290.8555 | 958039 | 0 | 9 | 2013-01-18 | 2013-01-20 | 2 | 1 | 1 | 8276 | 1 | 1 | 1 | 2 | 50 | 661 | 48 |
59 | 2013-01-07 09:32:24 | 2 | 3 | 66 | 174 | 55926 | 36.4296 | 7544 | 0 | 9 | 2013-01-25 | 2013-01-26 | 2 | 2 | 1 | 24123 | 6 | 1 | 1 | 2 | 50 | 363 | 31 |
60 | 2013-01-07 09:33:55 | 2 | 3 | 66 | 346 | 31371 | 5582.4818 | 561643 | 0 | 0 | 2013-07-30 | 2013-08-03 | 2 | 1 | 1 | 8739 | 1 | 1 | 1 | 6 | 144 | 4 | 78 |
61 | 2013-01-07 09:35:10 | 2 | 3 | 66 | 174 | 25951 | 1558.7211 | 538076 | 0 | 9 | 2013-01-15 | 2013-01-16 | 1 | 0 | 1 | 26756 | 6 | 1 | 1 | 2 | 50 | 414 | 18 |
62 | 2013-01-07 09:39:52 | 2 | 3 | 66 | 294 | 2078 | 598.8608 | 564188 | 0 | 9 | 2013-01-13 | 2013-01-17 | 1 | 0 | 1 | 4375 | 3 | 1 | 1 | 2 | 50 | 1058 | 91 |
63 | 2013-01-07 09:43:55 | 2 | 3 | 66 | 348 | 48862 | 1584.7565 | 372001 | 0 | 9 | 2013-04-04 | 2013-04-07 | 2 | 0 | 1 | 8277 | 1 | 1 | 1 | 2 | 50 | 412 | 4 |
64 | 2013-01-07 09:43:57 | 2 | 3 | 66 | 435 | 30431 | 8176.5169 | 751319 | 0 | 9 | 2013-01-25 | 2013-01-26 | 1 | 0 | 1 | 3171 | 3 | 1 | 1 | 3 | 171 | 61 | 64 |
65 | 2013-01-07 09:46:45 | 2 | 3 | 66 | 174 | 51219 | 209.9207 | 1047580 | 0 | 9 | 2013-01-09 | 2013-01-10 | 2 | 0 | 1 | 8250 | 1 | 1 | 1 | 2 | 50 | 628 | 54 |
66 | 2013-01-07 09:56:11 | 2 | 3 | 66 | 348 | 33705 | 1970.0900267207 | 403847 | 0 | 0 | 2013-01-12 | 2013-01-13 | 2 | 0 | 1 | 24629 | 6 | 1 | 1 | 2 | 50 | 420 | 42 |
67 | 2013-01-07 09:57:12 | 37 | 1 | 69 | 596 | 24410 | 1354.78 | 922482 | 0 | 0 | 2013-02-12 | 2013-02-16 | 4 | 0 | 2 | 20147 | 1 | 1 | 1 | 6 | 111 | 244 | 46 |
68 | 2013-01-07 10:02:42 | 2 | 3 | 66 | 363 | 16503 | 1970.0900267207 | 72073 | 0 | 1 | 2013-01-10 | 2013-01-12 | 1 | 0 | 1 | 11373 | 1 | 1 | 1 | 4 | 128 | 1455 | 26 |
69 | 2013-01-07 10:10:53 | 17 | 1 | 133 | 12 | 47083 | 1970.0900267207 | 705176 | 0 | 0 | 2013-02-01 | 2013-02-04 | 1 | 0 | 1 | 8279 | 1 | 1 | 1 | 2 | 50 | 1230 | 70 |
70 | 2013-01-07 10:11:03 | 2 | 3 | 66 | 346 | 29217 | 2.3981 | 560722 | 0 | 9 | 2013-01-25 | 2013-01-26 | 2 | 0 | 1 | 23293 | 6 | 1 | 1 | 2 | 50 | 633 | 54 |
71 | 2013-01-07 10:13:07 | 2 | 3 | 66 | 220 | 39730 | 1970.0900267207 | 813085 | 0 | 9 | 2013-06-09 | 2013-06-11 | 2 | 1 | 1 | 19535 | 1 | 1 | 1 | 2 | 50 | 593 | 5 |
72 | 2013-01-07 10:14:46 | 2 | 3 | 66 | 226 | 42300 | 8385.6368 | 426784 | 0 | 9 | 2013-03-26 | 2013-03-27 | 2 | 0 | 1 | 11683 | 6 | 1 | 1 | 3 | 171 | 61 | 98 |
73 | 2013-01-07 10:17:44 | 2 | 3 | 66 | 174 | 27251 | 334.5682 | 656573 | 0 | 9 | 2013-02-07 | 2013-02-10 | 2 | 0 | 1 | 1531 | 3 | 1 | 1 | 2 | 50 | 365 | 33 |
74 | 2013-01-07 10:18:47 | 2 | 3 | 66 | 184 | 13111 | 1871.5249 | 807452 | 0 | 9 | 2013-02-06 | 2013-02-07 | 1 | 0 | 1 | 20192 | 3 | 1 | 1 | 2 | 50 | 529 | 28 |
75 | 2013-01-07 10:19:03 | 37 | 1 | 69 | 682 | 52436 | 1970.0900267207 | 125562 | 0 | 0 | 2013-01-21 | 2013-01-23 | 1 | 0 | 1 | 8797 | 1 | 1 | 1 | 6 | 144 | 1450 | 59 |
76 | 2013-01-07 10:20:06 | 34 | 3 | 205 | 135 | 27655 | 185.131 | 936657 | 0 | 9 | 2013-02-21 | 2013-02-23 | 2 | 0 | 1 | 25944 | 6 | 1 | 1 | 2 | 198 | 370 | 10 |
77 | 2013-01-07 10:21:08 | 2 | 3 | 66 | 442 | 46407 | 157.744 | 335913 | 0 | 9 | 2013-01-08 | 2013-01-09 | 1 | 1 | 1 | 8218 | 1 | 1 | 1 | 2 | 50 | 743 | 48 |
78 | 2013-01-07 10:23:13 | 2 | 3 | 12 | 790 | 53139 | 1970.0900267207 | 345926 | 0 | 9 | 2013-03-01 | 2013-03-12 | 2 | 1 | 1 | 44201 | 8 | 1 | 1 | 2 | 50 | 676 | 32 |
79 | 2013-01-07 10:26:41 | 2 | 3 | 66 | 363 | 12346 | 81.4298 | 858744 | 0 | 9 | 2013-01-10 | 2013-01-11 | 1 | 0 | 1 | 45631 | 6 | 1 | 1 | 2 | 50 | 675 | 41 |
80 | 2013-01-07 10:28:30 | 2 | 3 | 66 | 184 | 53980 | 5647.1654 | 844887 | 0 | 9 | 2013-01-10 | 2013-01-15 | 1 | 0 | 1 | 12660 | 5 | 1 | 1 | 4 | 124 | 1946 | 36 |
81 | 2013-01-07 10:32:26 | 11 | 3 | 205 | 354 | 43201 | 1041.7216 | 748599 | 0 | 9 | 2013-02-01 | 2013-02-02 | 2 | 0 | 1 | 41743 | 4 | 1 | 1 | 2 | 50 | 582 | 13 |
82 | 2013-01-07 10:39:12 | 2 | 3 | 66 | 314 | 43382 | 777.4771 | 1100704 | 0 | 9 | 2013-01-07 | 2013-01-08 | 2 | 1 | 1 | 8268 | 1 | 1 | 1 | 2 | 50 | 682 | 75 |
83 | 2013-01-07 10:39:50 | 2 | 3 | 134 | 48 | 31289 | 1970.0900267207 | 766457 | 0 | 9 | 2013-04-13 | 2013-04-16 | 4 | 0 | 2 | 8808 | 1 | 1 | 1 | 6 | 169 | 28 | 9 |
84 | 2013-01-07 10:40:47 | 2 | 3 | 66 | 448 | 47357 | 2322.2325 | 728749 | 0 | 9 | 2013-01-20 | 2013-01-26 | 1 | 0 | 1 | 8281 | 1 | 1 | 1 | 2 | 50 | 663 | 49 |
85 | 2013-01-07 10:48:52 | 2 | 3 | 66 | 174 | 52010 | 1970.0900267207 | 1016574 | 0 | 9 | 2013-01-11 | 2013-01-12 | 1 | 0 | 1 | 24977 | 6 | 1 | 1 | 2 | 50 | 365 | 17 |
86 | 2013-01-07 10:49:02 | 2 | 3 | 66 | 174 | 17494 | 26.4261 | 696583 | 0 | 9 | 2013-02-04 | 2013-02-07 | 1 | 0 | 1 | 3547 | 3 | 1 | 1 | 2 | 50 | 1230 | 10 |
87 | 2013-01-07 10:49:09 | 2 | 3 | 66 | 196 | 2428 | 3833.6279 | 551478 | 0 | 9 | 2013-06-21 | 2013-06-23 | 3 | 1 | 2 | 8745 | 1 | 1 | 1 | 6 | 204 | 27 | 97 |
88 | 2013-01-07 10:49:57 | 34 | 3 | 205 | 312 | 10553 | 812.0063 | 106944 | 0 | 9 | 2013-03-23 | 2013-03-29 | 2 | 2 | 1 | 11629 | 1 | 1 | 1 | 2 | 198 | 370 | 68 |
89 | 2013-01-07 10:51:36 | 2 | 3 | 66 | 220 | 43026 | 1272.3001 | 640828 | 0 | 9 | 2013-02-10 | 2013-02-13 | 2 | 0 | 2 | 12257 | 6 | 1 | 1 | 4 | 51 | 1509 | 34 |
90 | 2013-01-07 10:51:39 | 2 | 3 | 66 | 220 | 2086 | 2179.5787 | 417637 | 0 | 9 | 2013-01-28 | 2013-01-31 | 2 | 1 | 1 | 8250 | 1 | 1 | 1 | 2 | 50 | 628 | 79 |
91 | 2013-01-07 11:04:58 | 34 | 3 | 205 | 411 | 54864 | 744.8963 | 324697 | 0 | 9 | 2013-03-04 | 2013-03-05 | 2 | 0 | 1 | 26414 | 6 | 1 | 2 | 2 | 198 | 399 | 10 |
92 | 2013-01-07 11:05:13 | 2 | 3 | 66 | 174 | 46432 | 134.0731 | 339047 | 0 | 1 | 2013-01-07 | 2013-01-10 | 2 | 0 | 1 | 24296 | 6 | 1 | 1 | 2 | 50 | 355 | 3 |
93 | 2013-01-07 11:09:56 | 13 | 1 | 46 | 244 | 45371 | 1970.0900267207 | 686091 | 0 | 9 | 2013-05-24 | 2013-05-25 | 2 | 1 | 1 | 23988 | 1 | 1 | 1 | 6 | 105 | 1837 | 85 |
94 | 2013-01-07 11:12:37 | 2 | 3 | 66 | 348 | 22772 | 1970.0900267207 | 1115906 | 0 | 9 | 2013-01-07 | 2013-01-08 | 1 | 0 | 1 | 25819 | 6 | 1 | 1 | 2 | 50 | 697 | 47 |
95 | 2013-01-07 11:19:00 | 2 | 3 | 66 | 184 | 22012 | 1970.0900267207 | 418305 | 0 | 9 | 2013-01-26 | 2013-01-28 | 1 | 1 | 1 | 8052 | 3 | 1 | 1 | 3 | 104 | 1003 | 46 |
96 | 2013-01-07 11:20:03 | 2 | 3 | 66 | 258 | 42500 | 3962.4226 | 351105 | 0 | 9 | 2013-02-12 | 2013-02-13 | 2 | 0 | 1 | 22869 | 6 | 1 | 1 | 4 | 93 | 2085 | 36 |
97 | 2013-01-07 11:21:50 | 2 | 3 | 66 | 220 | 3781 | 4338.0481 | 1040971 | 0 | 9 | 2013-02-03 | 2013-02-05 | 1 | 0 | 1 | 18481 | 1 | 1 | 1 | 6 | 70 | 309 | 77 |
98 | 2013-01-07 11:23:48 | 2 | 3 | 66 | 184 | 7976 | 1970.0900267207 | 808937 | 0 | 9 | 2013-01-21 | 2013-01-23 | 1 | 0 | 1 | 25625 | 6 | 1 | 1 | 2 | 50 | 562 | 91 |
99 | 2013-01-07 11:28:20 | 11 | 3 | 205 | 155 | 22091 | 44.9115 | 537131 | 0 | 9 | 2013-01-15 | 2013-01-16 | 1 | 0 | 1 | 26378 | 1 | 1 | 1 | 2 | 198 | 1671 | 43 |
100 | 2013-01-07 11:29:52 | 11 | 3 | 205 | 155 | 13951 | 28.4059 | 677922 | 0 | 9 | 2013-01-09 | 2013-01-10 | 2 | 0 | 1 | 23529 | 6 | 1 | 1 | 2 | 50 | 435 | 13 |
To return a vDataFrame to a previously saved structure, use the load()
function:
[31]:
expedia = expedia.load()
print(expedia.shape())
(37670293, 23)
Tip: For more information about a function, use the help() function
Because vDataFrames are views of data stored in the connected Vertica database, any modifications made to the vDataFrame are not reflected in the underlying data in the database. To save a vDataFrame’s relation to the database, use the to_db()
method.
It’s good practice to examine the expected disk usage of the vDataFrame before exporting it to the database:
[33]:
expedia.expected_store_usage(unit = "Gb")
100%|██████████| 2/2 [00:03<00:00, 1.74s/it]
[33]:
expected_size (Gb) | max_size (Gb) | type | |
"date_time" | 7.450580596923828e-09 | 0.2806655541062355 | timestamp |
"site_name" | 0.047050174325704575 | 0.2806655541062355 | int |
"posa_continent" | 0.03508319426327944 | 0.2806655541062355 | int |
"user_location_country" | 0.07453343831002712 | 0.2806655541062355 | int |
"user_location_region" | 0.09974934346973896 | 0.2806655541062355 | int |
"user_location_city" | 0.16734247468411922 | 0.2806655541062355 | int |
"orig_destination_distance" | 7.450580596923828e-09 | 0.2806655541062355 | float |
"user_id" | 0.21339667681604624 | 0.2806655541062355 | int |
"is_mobile" | 0.03508319426327944 | 0.2806655541062355 | int |
"channel" | 0.03508897125720978 | 0.2806655541062355 | int |
"srch_ci" | 7.450580596923828e-09 | 0.28031475841999054 | date |
"srch_co" | 7.450580596923828e-09 | 0.28031475096940994 | date |
"srch_adults_cnt" | 0.03508319426327944 | 0.2806655541062355 | int |
"srch_children_cnt" | 0.03508319426327944 | 0.2806655541062355 | int |
"srch_rm_cnt" | 0.03508319426327944 | 0.2806655541062355 | int |
"srch_destination_id" | 0.15629629883915186 | 0.2806655541062355 | int |
"srch_destination_type_id" | 0.03508319426327944 | 0.2806655541062355 | int |
"is_booking" | 0.03508319426327944 | 0.2806655541062355 | int |
"cnt" | 0.03518901206552982 | 0.2806655541062355 | int |
"hotel_continent" | 0.03508319426327944 | 0.2806655541062355 | int |
"hotel_country" | 0.0782695272937417 | 0.2806655541062355 | int |
"hotel_market" | 0.10538724157959223 | 2.806655541062355 | varchar |
"hotel_cluster" | 0.06651502475142479 | 0.2806655541062355 | int |
separator | 0.8069134680554271 | 0.8069134680554271 | |
header | 3.4831464290618896e-07 | 3.4831464290618896e-07 | |
rawsize | 2.166397583670914 | 9.787509948946536 |
If you decide that there is sufficient space to store the vDataFrame in the database, run the to_db()
method:
[ ]:
expedia.to_db("public.expedia_clean",
relation_type = "table")