Best practices#

In this tutorial, we will explore some best practices and optimizations to help you get the most out of Vertica and VerticaPy.

1. Restrict objects and operations to essential columns#

As VerticaPy is effectively an abstraction of SQL, any database-level optimizations you make in your Vertica database carry over to VerticaPy. In Vertica, optimization is centered on projections, which are collections of table columns—from one or more tables—stored on disk in a format that optimizes query execution. When you write queries in terms of the original tables, the query uses the projections to return query results. For details about creating and designing projections, see the Projections section in the Vertica documentation.

Projections are created and managed in the Vertica database, but you can leverage the power of projections in VerticaPy with features such as the vDataFrame’s usecols parameter, which specifies the columns from the input relation to include in the vDataFrame. As columnar databases perform better when there are fewer columns in the query, especially when you are working with large datasets, limiting vDataFrames and operations to essential columns can lead to a significant performance improvement. By default, most vDataFrame methods use all numerical columns in the vDataFrame, but you can restrict the operation to specific columns.

In the following examples, we’ll demonstrate how to create a vDataFrame from specific columns in the input relation, and then run methods on that vDataFrame. First, load the titanic dataset into Vertica using the load_titanic() function:

[2]:

from verticapy.datasets import load_titanic

load_titanic()

[2]:

	123 pclass Integer	123 survived Integer	Abc Varchar(164)	Abc sex Varchar(20)	123 age Numeric(8)	123 sibsp Integer	123 parch Integer	Abc ticket Varchar(36)	123 fare Numeric(12)	Abc cabin Varchar(30)	Abc embarked Varchar(20)	Abc boat Varchar(100)	123 body Integer	Abc Varchar(100)
1	1	0		male	47.0	1	0	PC 17757	227.525	C62 C64	C	[null]	124
2	1	0		male	24.0	0	1	PC 17558	247.5208	B58 B60	C	[null]	[null]
3	1	0		male	25.0	0	0	13905	26.0	[null]	C	[null]	148
4	1	0		male	42.0	0	0	110489	26.55	D22	S	[null]	[null]
5	1	0		male	45.0	0	0	113050	26.55	B38	S	[null]	[null]
6	1	0		male	46.0	1	0	W.E.P. 5734	61.175	E31	S	[null]	[null]
7	1	0		male	64.0	1	4	19950	263.0	C23 C25 C27	S	[null]	[null]
8	1	0		male	[null]	0	0	113796	42.4	[null]	S	[null]	[null]
9	1	0		male	32.5	0	0	113503	211.5	C132	C	[null]	45
10	1	0		male	55.0	0	0	113787	30.5	C30	S	[null]	[null]
11	1	0		male	37.0	0	1	PC 17596	29.7	C118	C	[null]	[null]
12	1	0		male	64.0	0	0	693	26.0	[null]	S	[null]	263
13	1	0		male	56.0	0	0	113792	26.55	[null]	S	[null]	[null]
14	1	0		male	24.0	1	0	13695	60.0	C31	S	[null]	[null]
15	1	1		male	0.92	1	2	113781	151.55	C22 C26	S	11	[null]
16	1	1		female	18.0	1	0	PC 17757	227.525	C62 C64	C	4	[null]
17	1	1		male	80.0	0	0	27042	30.0	A23	S	B	[null]
18	1	1		female	42.0	0	0	PC 17757	227.525	[null]	C	4	[null]
19	1	1		male	25.0	1	0	11967	91.0792	B49	C	7	[null]
20	1	1		female	45.0	0	0	PC 17608	262.375	[null]	C	4	[null]
21	1	1		female	22.0	0	1	113505	55.0	E33	S	6	[null]
22	1	1		female	[null]	0	0	17770	27.7208	[null]	C	5	[null]
23	1	1		female	22.0	0	0	113781	151.55	[null]	S	11	[null]
24	1	1		female	64.0	0	2	PC 17756	83.1583	E45	C	14	[null]
25	1	1		female	54.0	1	0	36947	78.2667	D20	C	4	[null]
26	1	1		male	43.0	1	0	17765	27.7208	D40	C	5	[null]
27	1	1		female	22.0	0	2	13568	49.5	B39	C	5	[null]
28	1	1		male	23.0	0	1	PC 17759	63.3583	D10 D12	C	7	[null]
29	1	1		female	35.0	1	0	113789	52.0	[null]	S	8	[null]
30	1	1		female	[null]	1	0	17464	51.8625	D21	S	8	[null]
31	1	1		male	42.0	1	0	11753	52.5542	D19	S	5	[null]
32	1	1		female	45.0	1	0	11753	52.5542	D19	S	5	[null]
33	1	1		female	16.0	0	1	PC 17592	39.4	D28	S	9	[null]
34	1	1		female	21.0	0	0	13502	77.9583	D9	S	10	[null]
35	1	1		male	36.0	0	0	PC 17473	26.2875	E25	S	7	[null]
36	1	1		male	[null]	0	0	F.C. 12998	25.7417	[null]	C	7	[null]
37	1	1		female	33.0	0	0	PC 17613	27.7208	A11	C	11	[null]
38	1	1		female	48.0	1	3	PC 17608	262.375	B57 B59 B63 B66	C	4	[null]
39	1	1		female	23.0	1	0	21228	82.2667	B45	S	7	[null]
40	1	1		female	31.0	0	2	36928	164.8667	C7	S	8	[null]
41	2	0		male	57.0	0	0	244346	13.0	[null]	S	[null]	[null]
42	2	0		female	29.0	1	0	SC/AH 29037	26.0	[null]	S	[null]	[null]
43	2	0		male	29.0	0	0	W./C. 14263	10.5	[null]	S	[null]	[null]
44	2	0		female	30.0	0	0	237249	13.0	[null]	S	[null]	[null]
45	2	0		male	24.0	0	0	248726	13.5	[null]	S	[null]	297
46	2	0		male	30.0	0	0	250646	13.0	[null]	S	[null]	305
47	2	0		male	52.0	0	0	250647	13.0	[null]	S	[null]	19
48	2	0		male	44.0	1	0	26707	26.0	[null]	S	[null]	[null]
49	2	0		male	57.0	0	0	219533	12.35	[null]	Q	[null]	[null]
50	2	0		male	32.0	0	0	237216	13.5	[null]	S	[null]	209
51	2	0		male	70.0	0	0	C.A. 24580	10.5	[null]	S	[null]	[null]
52	2	0		male	54.0	0	0	29011	14.0	[null]	S	[null]	[null]
53	2	0		male	16.0	0	0	S.O./P.P. 3	10.5	[null]	S	[null]	[null]
54	2	0		male	62.0	0	0	240276	9.6875	[null]	Q	[null]	[null]
55	2	0		male	27.0	0	0	SC/PARIS 2168	15.0333	[null]	C	[null]	[null]
56	2	0		male	36.0	0	0	C.A. 17248	10.5	[null]	S	[null]	[null]
57	2	0		female	27.0	1	0	11668	21.0	[null]	S	[null]	[null]
58	2	0		male	27.0	1	0	228414	26.0	[null]	S	[null]	293
59	2	0		male	66.0	0	0	C.A. 24579	10.5	[null]	S	[null]	[null]
60	2	1		female	48.0	0	2	C.A. 33112	36.75	[null]	S	14	[null]
61	2	1		female	30.0	1	0	SC/PARIS 2148	13.8583	[null]	C	12	[null]
62	2	1		female	34.0	0	0	243880	13.0	[null]	S	12	[null]
63	2	1		female	24.0	0	2	250649	14.5	[null]	S	4	[null]
64	2	1		female	48.0	1	2	220845	65.0	[null]	S	9	[null]
65	2	1		female	3.0	1	2	SC/Paris 2123	41.5792	[null]	C	14	[null]
66	2	1		male	3.0	1	1	230080	26.0	F2	S	D	[null]
67	2	1		female	2.0	1	1	26360	26.0	[null]	S	11	[null]
68	2	1		female	18.0	0	2	250652	13.0	[null]	S	16	[null]
69	3	0		male	30.0	0	0	C 7076	7.25	[null]	S	[null]	72
70	3	0		female	40.0	1	0	7546	9.475	[null]	S	[null]	[null]
71	3	0		male	35.0	0	0	373450	8.05	[null]	S	[null]	[null]
72	3	0		female	18.0	0	1	2691	14.4542	[null]	C	[null]	[null]
73	3	0		male	[null]	0	0	2664	7.225	[null]	C	[null]	[null]
74	3	0		female	32.0	1	1	364849	15.5	[null]	Q	[null]	[null]
75	3	0		male	22.0	0	0	350045	7.7958	[null]	S	[null]	[null]
76	3	0		male	35.0	0	0	364512	8.05	[null]	S	[null]	[null]
77	3	0		male	[null]	1	0	2689	14.4583	[null]	C	[null]	[null]
78	3	0		female	[null]	1	0	2689	14.4583	[null]	C	[null]	[null]
79	3	0		male	31.0	0	0	335097	7.75	[null]	Q	[null]	[null]
80	3	0		female	22.0	0	0	7552	10.5167	[null]	S	[null]	[null]
81	3	0		male	0.33	0	2	347080	14.4	[null]	S	[null]	[null]
82	3	0		male	34.0	1	1	347080	14.4	[null]	S	[null]	197
83	3	0		female	28.0	1	1	347080	14.4	[null]	S	[null]	[null]
84	3	0		male	25.0	0	0	349203	7.8958	[null]	S	[null]	[null]
85	3	0		male	25.0	0	0	349250	7.8958	[null]	S	[null]	[null]
86	3	0		male	[null]	0	0	349238	7.8958	[null]	S	[null]	[null]
87	3	0		male	[null]	0	0	349225	7.8958	[null]	S	[null]	[null]
88	3	0		male	22.0	0	0	A/5 21172	7.25	[null]	S	[null]	[null]
89	3	0		male	[null]	0	0	2674	7.225	[null]	C	[null]	[null]
90	3	0		male	[null]	0	0	2631	7.225	[null]	C	[null]	[null]
91	3	0		male	40.5	0	0	C.A. 6212	15.1	[null]	S	[null]	187
92	3	0		male	40.5	0	0	367232	7.75	[null]	Q	[null]	68
93	3	0		male	18.0	0	0	350036	7.7958	[null]	S	[null]	[null]
94	3	0		female	[null]	0	0	364859	7.75	[null]	Q	[null]	[null]
95	3	0		male	[null]	0	0	349254	7.8958	[null]	C	[null]	[null]
96	3	0		male	40.0	1	6	CA 2144	46.9	[null]	S	[null]	[null]
97	3	0		male	51.0	0	0	21440	8.05	[null]	S	[null]	[null]
98	3	0		male	26.0	1	0	350025	7.8542	[null]	S	[null]	[null]
99	3	0		female	[null]	0	0	382649	7.75	[null]	Q	[null]	[null]
100	3	0		male	[null]	0	0	349220	7.8958	[null]	S	[null]	[null]

Rows: 1-100 | Columns: 14

Supposing we are only interested in the ‘survived’, ‘pclass’, ‘age’, ‘parch’, and ‘sibsp’ columns, we can create a vDataFrame with just those columns by specifying them in the usecols parameter:

[3]:

import verticapy as vp

vdf = vp.vDataFrame("public.titanic",
                    usecols = ["survived", "pclass", "age", "parch", "sibsp"])
display(vdf)

	123 pclass Integer	123 survived Integer	123 age Numeric(8)	123 sibsp Integer	123 parch Integer
1	1	0	71.0	0	0
2	1	0	17.0	0	0
3	1	0	27.0	1	0
4	1	0	37.0	1	1
5	1	0	[null]	0	0
6	1	0	31.0	1	0
7	1	0	[null]	0	0
8	1	0	37.0	1	0
9	1	0	24.0	0	0
10	1	0	45.0	1	0
11	1	0	42.0	0	0
12	1	0	[null]	0	0
13	1	0	41.0	1	0
14	1	0	[null]	0	0
15	1	0	29.0	0	0
16	1	0	47.0	0	0
17	1	0	58.0	0	2
18	1	0	50.0	1	0
19	1	0	57.0	1	0
20	1	0	64.0	1	0
21	1	0	51.0	0	1
22	1	1	53.0	2	0
23	1	1	36.0	0	1
24	1	1	14.0	1	2
25	1	1	[null]	0	1
26	1	1	36.0	0	2
27	1	1	[null]	0	0
28	1	1	35.0	0	0
29	1	1	22.0	0	1
30	1	1	25.0	1	0
31	1	1	35.0	1	0
32	1	1	39.0	0	0
33	1	1	37.0	1	0
34	1	1	30.0	1	0
35	1	1	31.0	1	0
36	1	1	22.0	1	0
37	1	1	43.0	0	1
38	1	1	33.0	0	0
39	1	1	35.0	0	0
40	1	1	48.0	1	0
41	1	1	60.0	1	0
42	2	0	44.0	1	0
43	2	0	29.0	1	0
44	2	0	36.0	0	0
45	2	0	30.0	0	0
46	2	0	44.0	0	0
47	2	0	21.0	0	0
48	2	0	47.0	0	0
49	2	0	22.0	2	0
50	2	0	31.0	0	0
51	2	0	[null]	0	0
52	2	0	35.0	0	0
53	2	0	41.0	0	0
54	2	1	4.0	2	1
55	2	1	0.83	0	2
56	2	1	17.0	0	0
57	2	1	24.0	1	1
58	2	1	50.0	0	1
59	2	1	33.0	0	2
60	2	1	30.0	3	0
61	2	1	31.0	0	0
62	2	1	31.0	0	0
63	2	1	12.0	0	0
64	3	0	42.0	0	0
65	3	0	26.0	0	0
66	3	0	32.0	0	0
67	3	0	22.0	0	0
68	3	0	6.0	1	1
69	3	0	9.0	1	1
70	3	0	19.0	0	0
71	3	0	28.0	0	0
72	3	0	29.0	0	0
73	3	0	20.0	0	0
74	3	0	27.0	0	0
75	3	0	30.0	0	0
76	3	0	27.0	0	0
77	3	0	22.0	0	0
78	3	0	32.0	0	0
79	3	0	23.0	1	0
80	3	0	16.0	0	0
81	3	0	18.0	2	2
82	3	0	33.0	1	1
83	3	0	41.0	0	0
84	3	0	19.0	0	0
85	3	0	21.0	0	0
86	3	0	22.0	0	0
87	3	0	29.0	0	0
88	3	0	30.0	0	0
89	3	0	22.0	2	0
90	3	0	[null]	0	0
91	3	0	36.0	1	0
92	3	0	[null]	0	0
93	3	0	[null]	0	0
94	3	0	55.5	0	0
95	3	0	21.0	0	0
96	3	0	[null]	0	0
97	3	0	22.0	0	0
98	3	0	[null]	0	0
99	3	0	28.0	0	0
100	3	0	20.0	0	0

Rows: 1-100 | Columns: 5

If we run the avg() method without specifying columns, all numerical vDataFrame columns are included in the operation:

NOTE: To examine the generated SQL for each command, turn on the “sql_on” option.

[4]:

vp.set_option("sql_on", True)
vdf.avg()

Computing the different aggregations.

SELECT
/*+LABEL('vDataframe.aggregate')*/ AVG("pclass"),
AVG("survived"),
AVG("age"),
AVG("sibsp"),
AVG("parch")
FROM
(
SELECT
"pclass",
"survived",
"age",
"sibsp",
"parch"
FROM
"public"."titanic")
VERTICAPY_SUBTABLE
LIMIT 1

[4]:

	avg
"pclass"	2.28444084278768
"survived"	0.364667747163695
"age"	30.1524573721163
"sibsp"	0.504051863857374
"parch"	0.378444084278768

Rows: 1-5 | Columns: 2

To restrict the operation to specific columns in the vDataFrame, provide the column names in the columns parameter:

[5]:

vdf.avg(columns = ["age", "survived"])

[5]:

	avg
"age"	30.1524573721163
"survived"	0.364667747163695

Rows: 1-2 | Columns: 2

As we are working with a small dataset, the perfomance impact of excluding unncessary columns is not very significant. However, with large datasets (e.g. greater than a TB), the impact is much greater, and choosing essential columns becomes a key step in improving performance.

Instead of specifying essential columns to include, some methods allow you to list the columns to exclude with the exclude_columns parameter:

[6]:

vdf.numcol(exclude_columns = ["parch", "sibsp"])

[6]:

['"pclass"', '"survived"', '"age"']

NOTE: To list all columns in a vDataFrame, including non-numerical columns, use the get_columns() method.

You can then use this truncated list of columns in another method call; for instance, to compute a correlation matrix:

[7]:

vdf.corr(columns = vdf.numcol(exclude_columns = ["parch", "sibsp"]))

Computing the pearson Corr Matrix.

SELECT
/*+LABEL('vDataframe._aggregate_matrix')*/ CORR_MATRIX("pclass", "survived", "age") OVER ()
FROM
(
SELECT
"pclass",
"survived",
"age",
"sibsp",
"parch"
FROM
"public"."titanic")
VERTICAPY_SUBTABLE

../../../_images/notebooks_introduction_best-practices_index_17_2.png

To turn off the SQL code generation option:

[8]:

vp.set_option("sql_on", False)

2. Save the current relation#

The vDataFrame works like a view, a stored query that encapsulates one or more SELECT statements. If the generated relation uses many different functions, the computation time for each method call is greatly increased.

Small transformations don’t drastically slow down computation, but heavy transformations (multiple joins, frequent use of advanced analytical funcions, moving windows, etc.) can result in noticeable slowdown. When performing computationally expensive operations, you can aid performance by saving the vDataFrame structure as a table in the Vertica database. We will demonstrate this process in the following example.

First, create a vDataFrame, then perform some operations on that vDataFrame:

[9]:

vdf = vp.vDataFrame("public.titanic")
vdf["sex"].label_encode()["boat"].fillna(method = "0ifnull")["name"].str_extract(
    ' ([A-Za-z]+)\.').eval("family_size", expr = "parch + sibsp + 1").drop(
    columns = ["cabin", "body", "ticket", "home.dest"])["fare"].fill_outliers().fillna()
print(vdf.current_relation())

795 elements were filled.
(
   SELECT
     "pclass",
     "survived",
     "name",
     "sex",
     "age",
     "sibsp",
     "parch",
     COALESCE("fare", 32.9113074018842) AS "fare",
     "embarked",
     "boat",
     "family_size"
   FROM
 (
   SELECT
     "pclass",
     "survived",
     REGEXP_SUBSTR("name", ' ([A-Za-z]+)\.') AS "name",
     DECODE("sex", 'female', 0, 'male', 1, 2) AS "sex",
     COALESCE("age", 30.1524573721163) AS "age",
     "sibsp",
     "parch",
     (CASE WHEN "fare" < -176.6204982585513 THEN -176.6204982585513 WHEN "fare" > 244.5480856064831 THEN 244.5480856064831 ELSE "fare" END) AS "fare",
     COALESCE("embarked", 'S') AS "embarked",
     DECODE("boat", NULL, 0, 1) AS "boat",
     parch + sibsp + 1 AS "family_size"
   FROM
 (

   SELECT

                    "pclass",
     "survived",
     "name",
     "sex",
     "age",
     "sibsp",
     "parch",
     "fare",
     "embarked",
     "boat"

   FROM
 "public"."titanic")
VERTICAPY_SUBTABLE)
VERTICAPY_SUBTABLE)
VERTICAPY_SUBTABLE

To understand how Vertica executes the different aggregations in the above relation, let’s take a look at the query plan:

NOTE: Query plans can be hard to interpret if you don’t know how to parse them. For more information, see query plan information and structure.

[10]:

print(vdf.explain())

------------------------------
QUERY PLAN DESCRIPTION:

EXPLAIN SELECT /*+LABEL('vDataframe.explain')*/ * FROM (SELECT "pclass", "survived", "name", "sex", "age", "sibsp", "parch", COALESCE("fare", 32.9113074018842) AS "fare", "embarked", "boat", "family_size" FROM (SELECT "pclass", "survived", REGEXP_SUBSTR("name", ' ([A-Za-z]+)\.') AS "name", DECODE("sex", 'female', 0, 'male', 1, 2) AS "sex", COALESCE("age", 30.1524573721163) AS "age", "sibsp", "parch", (CASE WHEN "fare" < -176.6204982585513 THEN -176.6204982585513 WHEN "fare" > 244.5480856064831 THEN 244.5480856064831 ELSE "fare" END) AS "fare", COALESCE("embarked", 'S') AS "embarked", DECODE("boat", NULL, 0, 1) AS "boat", parch + sibsp + 1 AS "family_size" FROM ( SELECT "pclass", "survived", "name", "sex", "age", "sibsp", "parch", "fare", "embarked", "boat" FROM "public"."titanic") VERTICAPY_SUBTABLE) VERTICAPY_SUBTABLE) VERTICAPY_SUBTABLE

Access Path:
+-STORAGE ACCESS for titanic [Cost: 129, Rows: 1K (NO STATISTICS)] (PATH ID: 1)
|  Projection: public.titanic_super
|  Materialize: titanic.pclass, titanic.survived, titanic.name, titanic.sex, titanic.age, titanic.sibsp, titanic.parch, titanic.fare, titanic.embarked, titanic.boat
|  Execute on: v_vertica_eon_node0001, v_vertica_eon_node0002, v_vertica_eon_node0003


-----------------------------------------------
PLAN: BASE QUERY PLAN (GraphViz Format)
-----------------------------------------------
digraph G {
graph [rankdir=BT, label = "BASE QUERY PLAN
        Query: EXPLAIN SELECT /*+LABEL(\'vDataframe.explain\')*/ * FROM (SELECT \"pclass\", \"survived\", \"name\", \"sex\", \"age\", \"sibsp\", \"parch\", COALESCE(\"fare\", 32.9113074018842) AS \"fare\", \"embarked\", \"boat\", \"family_size\" FROM (SELECT \"pclass\", \"survived\", REGEXP_SUBSTR(\"name\", \' ([A-Za-z]+)\.\') AS \"name\", DECODE(\"sex\", \'female\', 0, \'male\', 1, 2) AS \"sex\", COALESCE(\"age\", 30.1524573721163) AS \"age\", \"sibsp\", \"parch\", (CASE WHEN \"fare\" \< -176.6204982585513 THEN -176.6204982585513 WHEN \"fare\" \> 244.5480856064831 THEN 244.5480856064831 ELSE \"fare\" END) AS \"fare\", COALESCE(\"embarked\", \'S\') AS \"embarked\", DECODE(\"boat\", NULL, 0, 1) AS \"boat\", parch + sibsp + 1 AS \"family_size\" FROM ( SELECT \"pclass\", \"survived\", \"name\", \"sex\", \"age\", \"sibsp\", \"parch\", \"fare\", \"embarked\", \"boat\" FROM \"public\".\"titanic\") VERTICAPY_SUBTABLE) VERTICAPY_SUBTABLE) VERTICAPY_SUBTABLE

        All Nodes Vector:

          node[0]=v_vertica_eon_node0001 (initiator) Up
          node[1]=v_vertica_eon_node0002 (executor) Up
          node[2]=v_vertica_eon_node0003 (executor) Up
          node[3]=v_vertica_eon_node0004 (executor) Down
          node[4]=v_vertica_eon_node0005 (executor) Down
          node[5]=v_vertica_eon_node0006 (executor) Down
          node[6]=v_vertica_eon_node0007 (executor) Down
          node[7]=v_vertica_eon_node0008 (executor) Down
          node[8]=v_vertica_eon_node0009 (executor) Down

        Participating subscriptions:

          replica: [<1, 2, 0>]
          segment0001: [<0>]
          segment0002: [<1>]
          segment0003: [<2>]
          segment0004: [<0>]
          segment0005: [<1>]
          segment0006: [<2>]
          segment0007: [<0>]
          segment0008: [<1>]
          segment0009: [<2>]
        ", labelloc=t, labeljust=l ordering=out]
0[label = "Root
        OutBlk=[UncTuple(11)]", color = "green", shape = "house"];
1[label = "NewEENode
        OutBlk=[UncTuple(11)]", color = "green", shape = "box"];
2[label = "Recv
        Recv from: v_vertica_eon_node0001, v_vertica_eon_node0002, v_vertica_eon_node0003
        Net id: 1000

        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Varchar(164)
        Unc: Integer(8)
        Unc: Numeric(16, 13)
        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Numeric(18, 13)
        Unc: Varchar(20)
        Unc: Integer(8)
        Unc: Integer(8)", color = "green", shape = "box"];
3[label = "Send
        Send to: v_vertica_eon_node0001
        Net id: 1000

        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Varchar(164)
        Unc: Integer(8)
        Unc: Numeric(16, 13)
        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Numeric(18, 13)
        Unc: Varchar(20)
        Unc: Integer(8)
        Unc: Integer(8)", color = "green", shape = "box"];
4[label = "StorageUnionStep: titanic_super
        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Varchar(164)
        Unc: Integer(8)
        Unc: Numeric(16, 13)
        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Numeric(18, 13)
        Unc: Varchar(20)
        Unc: Integer(8)
        Unc: Integer(8)", color = "purple", shape = "box"];
5[label = "ExprEval:
          titanic.pclass
          titanic.survived
          regexp_substr(titanic.name, E\' ([A-Za-z]+)\\.\', 1, 1, \'\', 0)
          CASE titanic.sex WHEN NULLSEQUAL \'female\' THEN 0 WHEN NULLSEQUAL \'male\' THEN 1 ELSE 2 END
          coalesce(titanic.age, 30.1524573721163)
          titanic.sibsp
          titanic.parch
          coalesce(CASE WHEN (titanic.fare \< (-176.6204982585513)) THEN (-176.6204982585513) WHEN (titanic.fare \> 244.5480856064831) THEN 244.5480856064831 ELSE titanic.fare END, 32.9113074018842)
          coalesce(titanic.embarked, \'S\')
          CASE titanic.boat WHEN NULLSEQUAL NULL THEN 0 ELSE 1 END
          ((titanic.parch + titanic.sibsp) + 1)
        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Varchar(164)
        Unc: Integer(8)
        Unc: Numeric(16, 13)
        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Numeric(18, 13)
        Unc: Varchar(20)
        Unc: Integer(8)
        Unc: Integer(8)", color = "brown", shape = "box"];
6[label = "ScanStep: titanic_super
        pclass
        survived
        name
        sex
        age
        sibsp
        parch
        fare
        embarked
        boat
        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Varchar(164)
        Unc: Varchar(20)
        Unc: Numeric(6, 3)
        Unc: Integer(8)
        Unc: Integer(8)
        Unc: Numeric(10, 5)
        Unc: Varchar(20)
        Unc: Varchar(100)", color = "brown", shape = "box"];
1->0 [label = "V[0] C=11", color = "black", style="bold", arrowtail="inv"];
2->1 [label = "0", color = "blue"];
3->2 [label = "0", color = "blue"];
4->3 [label = "0", color = "blue"];
5->4 [label = "0", color = "blue"];
6->5 [label = "0", color = "blue"];}

Looking at the plan and its associated relation, it’s clear that the transformations we applied to the vDataFrame result in a complicated relation. Each method call to the vDataFrame must use this relation for computation.

NOTE: To better understand your queries, check out the `QueryProfiler <https://www.vertica.com/python/documentation/1.0.x/html/api/verticapy.performance.vertica.qprof.QueryProfiler.html>`__ function.

To save the relation as a table in the Vertica and replace the current relation in VerticaPy with the new table relation, use the `to_db() <https://www.vertica.com/python/documentation/1.0.x/html/verticapy.vDataFrame.to_db.html>`__ method with the inplace parameter set to True:

[11]:

vp.drop("public.titanic_clean", method = "table") # drops any existing table with the same schema and name
vdf.to_db("public.titanic_clean",
          relation_type = "table",
          inplace = True)
print(vdf.current_relation())

"public"."titanic_clean"

When dealing with very large datasets, it’s best to take caution before saving relations with complicated transformations. Ideally, you will perform a thorough data exploration, and only execute heavy transformations when essential.

3. Use the help function#

For a quick and convenient way to view information about an object or function, use the help() function:

[12]:

help(vp.connect)

Help on function connect in module verticapy.connection.connect:

connect(section: str, dsn: Optional[str] = None) -> None
    Connects to the database.

    Parameters
    ----------
    section: str
        Name of the section in the
        configuration file.
    dsn: str, optional
        Path to the file containing
        the credentials. If empty,
        the Connection File will be
        used.

    Examples
    --------
    Display all available connections:

    .. code-block:: python

        from verticapy.connection import available_connections

        available_connections()

    ``['VML', 'VerticaDSN', 'VerticaDSN_test']``

    Connect using the VerticaDSN connection:

    .. code-block:: python

        from verticapy.connection import connect

        connect("VerticaDSN")

    .. seealso::

        | :py:func:`~verticapy.connection.available_connections` :
            Displays all available connections.
        | :py:func:`~verticapy.connection.get_connection_file` :
            Gets the VerticaPy connection file.
        | :py:func:`~verticapy.connection.new_connection` :
            Creates a new VerticaPy connection.
        | :py:func:`~verticapy.connection.set_connection` :
            Sets the VerticaPy connection.

4. Close your connections#

Each connection to the database increases the concurrency on the system, so try to close connections when you’re done with them. VerticaPy simplifies the connection process by allowing the user to create an auto-connection, but the closing of connections must be done manually with the close_connection() function.

To demonstrate, create a database connection:

[ ]:

vp.connect("VerticaDSN")

When you are done making changes, close the connection with the close_connection() function:

[ ]:

vp.close_connection()

It is especially important to close connections when you are working in an environment with mutliple users.

5. Consider a method’s time complexity#

Some techniques are significantly more computationally expensive than others. For example, a Kendall correlation is very expensive compared to a Pearson correlation because, unlike Pearson, Kendall correlations use a cross join, resulting in a time complexity of O(n*n) (where n is the number of rows). Let’s compare the time needed to compute these two correlations on the ‘titanic’ dataset:

[13]:

import time

vdf = vp.vDataFrame("public.titanic")

start_time = time.time()
x = vdf.corr(method = "pearson", show = False)
print("Pearson, time: {0}".format(time.time() - start_time))

start_time = time.time()
x = vdf.corr(method = "kendall", show = False)
print("Kendall, time: {0}".format(time.time() - start_time))

Pearson, time: 0.17660832405090332

100%|██████████| 6/6 [00:00<00:00, 2943.03it/s]

Kendall, time: 0.857501745223999

The Kendall correlation matrix takes longer to compute than the Pearson matrix. As we are using a relatively small dataset, this difference is not very large, but it increases in magntiude with the size of the dataset. Taking into account the time complexity of operations is a key step in optimzing performance, esepcially with large datasets.

6. Limit plot elements#

Graphics are an essential tool to understand your data, but they can become difficult to parse if they contain too many elements. VerticaPy provides options that restrict plots to specified elements. To demonstrate, let’s first draw a multi-histogram with a categorical column with thousands of categories:

[3]:

vdf.bar(["name", "survived"])

VerticaPy outputs the bar chart, but the number of categories makes the graph basically incomprehensible. Instead, whenever possible, try to create graphics with as few categories as possible for your use case:

[4]:

vdf.hist(["pclass", "survived"])

To view the cardinality of your variables, use the nunique() method:

[17]:

vdf.nunique()

[17]:

	approx_unique
"pclass"	3.0
"survived"	2.0
"name"	1233.0
"sex"	2.0
"age"	96.0
"sibsp"	7.0
"parch"	8.0
"ticket"	888.0
"fare"	275.0
"cabin"	181.0
"embarked"	3.0
"boat"	26.0
"body"	118.0
"home.dest"	355.0

Rows: 1-14 | Columns: 2

7. Filter unnecessary data#

Filtering your data is a crucial step in data preparation. Proper filtering avoids unnecessary computations and greatly improves the performance of each method call. While the performance impact can be minimal for small datasets, filtering large datasets is key to improving performance.

For example, if we are only interested in analyzing Titanic passengers who didn’t have a lifeboat, we can filter on this requirement using the `filter() <https://www.vertica.com/python/documentation/1.0.x/html/verticapy.vDataFrame.filter.html>`__ method:

[18]:

vdf.filter("boat IS NOT NULL")

795 elements were filtered.

[18]:

	123 pclass Integer	123 survived Integer	Abc Varchar(164)	Abc sex Varchar(20)	123 age Numeric(8)	123 sibsp Integer	123 parch Integer	Abc ticket Varchar(36)	123 fare Numeric(12)	Abc cabin Varchar(30)	Abc embarked Varchar(20)	Abc boat Varchar(100)	123 body Integer	Abc Varchar(100)
1	1	1		male	37.0	1	1	11751	52.5542	D35	S	5	[null]
2	1	1		male	26.0	0	0	111369	30.0	C148	C	5	[null]
3	1	1		female	35.0	0	0	PC 17760	135.6333	C99	S	8	[null]
4	1	1		female	60.0	0	0	11813	76.2917	D15	C	8	[null]
5	1	1		male	45.0	0	0	PC 17594	29.7	A9	C	7	[null]
6	1	1		female	27.0	1	1	PC 17558	247.5208	B58 B60	C	6	[null]
7	1	1		female	35.0	1	0	113803	53.1	C123	S	D	[null]
8	1	1		male	53.0	0	0	113780	28.5	C51	C	B	[null]
9	1	1		female	19.0	0	0	112053	30.0	B42	S	3	[null]
10	1	1		female	58.0	0	1	PC 17582	153.4625	C125	S	3	[null]
11	1	1		male	25.0	1	0	11765	55.4417	E50	C	5	[null]
12	1	1		male	[null]	0	0	16988	30.0	D45	S	3	[null]
13	1	1		female	18.0	1	0	113773	53.1	D30	S	10	[null]
14	1	1		female	30.0	0	0	12749	93.5	B73	S	3	[null]
15	1	1		female	54.0	1	0	PC 17603	59.4	[null]	C	6	[null]
16	1	1		female	18.0	2	2	PC 17608	262.375	B57 B59 B63 B66	C	4	[null]
17	1	1		female	43.0	1	0	11778	55.4417	C116	C	5	[null]
18	1	1		female	39.0	1	1	110413	79.65	E67	S	8	[null]
19	1	1		female	39.0	1	1	17421	110.8833	C68	C	4	[null]
20	1	1		female	55.0	0	0	PC 17760	135.6333	C32	C	8	[null]
21	2	1		male	1.0	2	1	230136	39.0	F4	S	11	[null]
22	2	1		female	20.0	1	0	236853	26.0	[null]	S	12	[null]
23	2	1		female	45.0	0	2	237789	30.0	[null]	S	12	[null]
24	2	1		female	27.0	1	0	SC/PARIS 2149	13.8583	[null]	C	12	[null]
25	2	1		female	54.0	1	3	29105	23.0	[null]	S	4	[null]
26	2	1		male	42.0	0	0	237798	13.0	[null]	S	10	[null]
27	2	1		female	[null]	0	0	226593	12.35	E101	Q	10	[null]
28	2	1		female	34.0	0	0	C.A. 34260	10.5	F33	S	14	[null]
29	2	1		male	29.0	0	0	SC/PARIS 2147	13.8583	[null]	C	9	[null]
30	2	1		male	0.83	1	1	29106	18.75	[null]	S	4	[null]
31	2	1		male	2.0	1	1	29103	23.0	[null]	S	14	[null]
32	3	1		female	19.0	1	0	350046	7.8542	[null]	S	16	[null]
33	3	1		female	22.0	0	0	334914	7.725	[null]	Q	13	[null]
34	3	1		male	45.0	0	0	7598	8.05	[null]	S	15	[null]
35	3	1		male	1.0	1	2	C.A. 2315	20.575	[null]	S	10	[null]
36	3	1		male	[null]	0	0	SOTON/O.Q. 3101308	7.05	[null]	S	15	[null]
37	3	1		female	45.0	1	0	350026	14.1083	[null]	S	11	[null]
38	3	1		female	27.0	0	1	392096	12.475	E121	S	14	[null]
39	3	1		female	[null]	1	0	371110	24.15	[null]	Q	16	[null]
40	3	1		female	[null]	1	0	367230	15.5	[null]	Q	16	[null]
41	3	1		female	26.0	0	0	347470	7.8542	[null]	S	13	[null]
42	3	1		male	25.0	0	0	345768	9.5	[null]	S	11	[null]
43	3	1		female	[null]	0	0	335432	7.7333	[null]	Q	13	[null]
44	1	1		female	32.0	0	0	11813	76.2917	D15	C	8	[null]
45	1	1		female	47.0	1	1	11751	52.5542	D35	S	5	[null]
46	1	1		female	44.0	0	0	PC 17610	27.7208	B4	C	6	[null]
47	1	1		female	36.0	1	2	113760	120.0	B96 B98	S	4	[null]
48	1	1		female	36.0	0	0	PC 17608	262.375	B61	C	4	[null]
49	1	1		female	39.0	1	1	PC 17756	83.1583	E49	C	14	[null]
50	1	1		female	38.0	1	0	PC 17599	71.2833	C85	C	4	[null]
51	1	1		female	33.0	0	0	113781	151.55	[null]	S	8	[null]
52	1	1		female	27.0	1	2	F.C. 12750	52.0	B71	S	3	[null]
53	1	1		female	24.0	3	2	19950	263.0	C23 C25 C27	S	10	[null]
54	1	1		male	49.0	1	0	17453	89.1042	C92	C	5	[null]
55	1	1		female	16.0	0	1	111361	57.9792	B18	C	4	[null]
56	1	1		female	30.0	0	0	PC 17761	106.425	[null]	C	2	[null]
57	1	1		female	35.0	1	0	13236	57.75	C28	C	11	[null]
58	1	1		female	[null]	0	0	PC 17585	79.2	[null]	C	D	[null]
59	1	1		male	[null]	0	0	19947	35.5	C52	S	D	[null]
60	2	1		female	28.0	1	0	P/PP 3381	24.0	[null]	C	10	[null]
61	2	1		male	34.0	0	0	248698	13.0	D56	S	13	[null]
62	2	1		male	24.0	0	0	28034	10.5	[null]	S	9	[null]
63	2	1		male	8.0	0	2	28220	32.5	[null]	S	10	[null]
64	2	1		female	24.0	2	1	243847	27.0	[null]	S	12	[null]
65	2	1		male	2.0	1	1	230080	26.0	F2	S	D	[null]
66	2	1		male	22.0	0	0	W./C. 14260	10.5	[null]	S	13	[null]
67	2	1		female	50.0	0	0	W./C. 14258	10.5	[null]	S	13	[null]
68	2	1		female	50.0	0	0	F.C.C. 13531	10.5	[null]	S	9	[null]
69	2	1		female	4.0	1	1	29103	23.0	[null]	S	14	[null]
70	3	1		female	17.0	4	2	3101281	7.925	[null]	S	D	[null]
71	3	1		female	18.0	0	0	A/4 31416	8.05	[null]	S	C	[null]
72	3	1		female	[null]	1	0	386525	16.1	[null]	S	16	[null]
73	3	1		male	[null]	0	0	1601	56.4958	[null]	S	13	[null]
74	3	1		male	[null]	0	0	1601	56.4958	[null]	S	C	[null]
75	3	1		male	26.0	0	0	347070	7.775	[null]	S	15	[null]
76	3	1		male	[null]	0	0	1601	56.4958	[null]	S	C	[null]
77	3	1		female	22.0	0	0	C 7077	7.25	[null]	S	13	[null]
78	3	1		male	20.0	1	0	STON/O 2. 3101285	7.925	[null]	S	15	[null]
79	3	1		female	24.0	0	0	382653	7.75	[null]	Q	15	[null]
80	3	1		female	[null]	0	0	36568	15.5	[null]	Q	16	[null]
81	3	1		male	20.0	1	1	2653	15.7417	[null]	C	C	[null]
82	3	1		male	12.0	1	0	2651	11.2417	[null]	C	C	[null]
83	3	1		male	25.0	1	0	347083	7.775	[null]	S	15	[null]
84	3	1		female	[null]	1	1	2668	22.3583	F E69	C	D	[null]
85	3	1		female	23.0	0	0	CA. 2314	7.55	[null]	S	C	[null]
86	3	1		female	[null]	1	0	376564	16.1	[null]	S	10	[null]
87	1	0		male	36.0	0	0	13050	75.2417	C6	C	A	[null]
88	1	1		female	29.0	0	0	24160	211.3375	B5	S	2	[null]
89	1	1		male	48.0	0	0	19952	26.55	E12	S	3	[null]
90	1	1		male	28.0	0	0	110564	26.55	C52	S	D	[null]
91	1	1		female	33.0	1	0	113806	53.1	E8	S	5	[null]
92	1	1		female	30.0	0	0	110152	86.5	B77	S	8	[null]
93	1	1		female	64.0	1	1	112901	26.55	B26	S	7	[null]
94	1	1		male	27.0	0	0	113804	30.5	[null]	S	3	[null]
95	1	1		female	17.0	1	0	17474	57.0	B20	S	3	[null]
96	1	1		male	53.0	1	1	33638	81.8583	A34	S	13	[null]
97	1	1		female	[null]	1	0	PC 17611	133.65	[null]	S	5	[null]
98	1	1		female	51.0	1	0	13502	77.9583	D11	S	10	[null]
99	1	1		female	33.0	1	0	19928	90.0	C78	Q	14	[null]
100	1	1		female	19.0	0	2	11752	26.2833	D47	S	5	[null]

Rows: 1-100 of 439 | Columns: 14

To drop unnecessary columns from your vDataFrame, use the `drop() <https://www.vertica.com/python/documentation/1.0.x/html/verticapy.vDataFrame.drop.html>`__ method:

[19]:

vdf.drop(["name", "body"])

[19]:

	123 pclass Integer	123 survived Integer	Abc sex Varchar(20)	123 age Numeric(8)	123 sibsp Integer	123 parch Integer	Abc ticket Varchar(36)	123 fare Numeric(12)	Abc cabin Varchar(30)	Abc embarked Varchar(20)	Abc boat Varchar(100)	Abc Varchar(100)
1	1	1	male	0.92	1	2	113781	151.55	C22 C26	S	11
2	1	1	female	18.0	1	0	PC 17757	227.525	C62 C64	C	4
3	1	1	male	80.0	0	0	27042	30.0	A23	S	B
4	1	1	female	42.0	0	0	PC 17757	227.525	[null]	C	4
5	1	1	male	25.0	1	0	11967	91.0792	B49	C	7
6	1	1	female	45.0	0	0	PC 17608	262.375	[null]	C	4
7	1	1	female	22.0	0	1	113505	55.0	E33	S	6
8	1	1	female	[null]	0	0	17770	27.7208	[null]	C	5
9	1	1	female	22.0	0	0	113781	151.55	[null]	S	11
10	1	1	female	64.0	0	2	PC 17756	83.1583	E45	C	14
11	1	1	female	54.0	1	0	36947	78.2667	D20	C	4
12	1	1	male	43.0	1	0	17765	27.7208	D40	C	5
13	1	1	female	22.0	0	2	13568	49.5	B39	C	5
14	1	1	male	23.0	0	1	PC 17759	63.3583	D10 D12	C	7
15	1	1	female	35.0	1	0	113789	52.0	[null]	S	8
16	1	1	female	[null]	1	0	17464	51.8625	D21	S	8
17	1	1	male	42.0	1	0	11753	52.5542	D19	S	5
18	1	1	female	45.0	1	0	11753	52.5542	D19	S	5
19	1	1	female	16.0	0	1	PC 17592	39.4	D28	S	9
20	1	1	female	21.0	0	0	13502	77.9583	D9	S	10
21	1	1	male	36.0	0	0	PC 17473	26.2875	E25	S	7
22	1	1	male	[null]	0	0	F.C. 12998	25.7417	[null]	C	7
23	1	1	female	33.0	0	0	PC 17613	27.7208	A11	C	11
24	1	1	female	48.0	1	3	PC 17608	262.375	B57 B59 B63 B66	C	4
25	1	1	female	23.0	1	0	21228	82.2667	B45	S	7
26	1	1	female	31.0	0	2	36928	164.8667	C7	S	8
27	2	1	female	48.0	0	2	C.A. 33112	36.75	[null]	S	14
28	2	1	female	30.0	1	0	SC/PARIS 2148	13.8583	[null]	C	12
29	2	1	female	34.0	0	0	243880	13.0	[null]	S	12
30	2	1	female	24.0	0	2	250649	14.5	[null]	S	4
31	2	1	female	48.0	1	2	220845	65.0	[null]	S	9
32	2	1	female	3.0	1	2	SC/Paris 2123	41.5792	[null]	C	14
33	2	1	male	3.0	1	1	230080	26.0	F2	S	D
34	2	1	female	2.0	1	1	26360	26.0	[null]	S	11
35	2	1	female	18.0	0	2	250652	13.0	[null]	S	16
36	3	0	male	27.0	1	0	2659	14.4542	[null]	C	C
37	3	1	male	25.0	0	0	348122	7.65	F G63	S	A
38	3	1	male	24.0	0	0	S.O./P.P. 752	7.55	[null]	S	D
39	3	1	female	22.0	1	0	347072	13.9	[null]	S	16
40	3	1	female	[null]	0	0	A. 2. 39186	8.05	[null]	S	C
41	3	1	male	21.0	0	0	350034	7.7958	[null]	S	A
42	3	1	male	4.0	1	1	347742	11.1333	[null]	S	15
43	3	1	female	27.0	0	2	347742	11.1333	[null]	S	15
44	3	1	male	21.0	0	0	350053	7.7958	[null]	S	13
45	3	1	male	39.0	0	1	349256	13.4167	[null]	C	15
46	3	1	female	26.0	1	1	315153	22.025	[null]	S	2
47	3	1	male	25.0	0	0	2654	7.2292	F E57	C	10
48	3	1	female	[null]	0	0	36866	7.7375	[null]	Q	16
49	3	1	female	[null]	0	2	2668	22.3583	[null]	C	D
50	3	1	female	[null]	0	0	342712	8.05	[null]	S	C
51	3	1	female	21.0	0	0	343120	7.65	[null]	S	C
52	1	1	female	63.0	1	0	13502	77.9583	D7	S	10
53	1	1	female	58.0	0	0	113783	26.55	C103	S	8
54	1	1	female	41.0	0	0	16966	134.5	E40	C	3
55	1	1	female	53.0	0	0	PC 17606	27.4458	[null]	C	6
56	1	1	female	58.0	0	1	PC 17755	512.3292	B51 B53 B55	C	3
57	1	1	male	11.0	1	2	113760	120.0	B96 B98	S	4
58	1	1	female	76.0	1	0	19877	78.85	C46	S	6
59	1	1	male	48.0	1	0	PC 17572	76.7292	D33	C	3
60	1	1	male	27.0	0	0	PC 17572	76.7292	D49	C	3
61	1	1	female	24.0	0	0	11767	83.1583	C54	C	7
62	1	1	female	52.0	1	1	12749	93.5	B69	S	3
63	1	1	female	44.0	0	1	111361	57.9792	B18	C	4
64	1	1	female	15.0	0	1	24160	211.3375	B5	S	2
65	1	1	female	39.0	0	0	PC 17758	108.9	C105	C	8
66	1	1	female	22.0	0	1	113509	61.9792	B36	C	5
67	1	1	male	52.0	0	0	113786	30.5	C104	S	6
68	1	1	male	45.0	1	1	16966	134.5	E34	C	3
69	1	1	female	40.0	1	1	16966	134.5	E34	C	3
70	1	1	female	35.0	0	0	PC 17755	512.3292	[null]	C	3
71	1	1	male	21.0	0	1	PC 17597	61.3792	[null]	C	A
72	2	0	male	34.0	1	0	31027	21.0	[null]	S	12
73	2	1	female	19.0	0	0	28404	13.0	[null]	S	12
74	2	1	female	25.0	1	1	237789	30.0	[null]	S	12
75	2	1	male	8.0	1	1	C.A. 33112	36.75	[null]	S	14
76	2	1	female	28.0	0	0	237668	13.0	[null]	S	13
77	2	1	female	29.0	1	0	2926	26.0	[null]	S	16
78	2	1	female	7.0	0	2	F.C.C. 13529	26.25	[null]	S	14
79	2	1	female	29.0	1	0	26707	26.0	[null]	S	10
80	2	1	female	24.0	1	0	244367	26.0	[null]	S	12
81	2	1	male	1.0	0	2	S.C./PARIS 2079	37.0042	[null]	C	10
82	2	1	female	28.0	0	0	230434	13.0	[null]	S	9
83	2	1	female	30.0	0	0	234818	12.35	[null]	Q	13
84	2	1	female	29.0	1	0	228414	26.0	[null]	S	10
85	2	1	female	0.92	1	2	C.A. 34651	27.75	[null]	S	10
86	2	1	female	24.0	1	0	SC/PARIS 2167	27.7208	[null]	C	12
87	3	1	female	16.0	0	0	348125	7.65	[null]	S	16
88	3	1	female	18.0	0	1	392091	9.35	[null]	S	13
89	3	1	female	5.0	4	2	347077	31.3875	[null]	S	15
90	3	1	female	30.0	0	0	364516	12.475	[null]	S	13
91	3	1	female	[null]	0	0	335677	7.75	[null]	Q	13
92	3	1	male	[null]	0	0	3470	7.8875	[null]	S	C
93	3	1	female	[null]	0	0	14313	7.75	[null]	Q	D
94	3	1	female	[null]	0	0	14312	7.75	[null]	Q	D
95	3	1	female	4.0	0	2	315153	22.025	[null]	S	2
96	3	1	female	[null]	0	0	330932	7.7875	[null]	Q	13
97	3	1	male	6.0	0	1	392096	12.475	E121	S	14
98	3	1	female	[null]	0	0	14311	7.75	[null]	Q	D
99	3	1	male	9.0	0	1	C 17368	3.1708	[null]	S	13
100	3	1	male	32.0	0	0	347079	7.775	[null]	S	A

Rows: 1-100 of 439 | Columns: 12

The dropped columns are excluded from the relation’s generated SELECT query:

[20]:

print(vdf.current_relation())

(
   SELECT
     *
   FROM
 (

   SELECT

                    "pclass",
     "survived",
     "sex",
     "age",
     "sibsp",
     "parch",
     "ticket",
     "fare",
     "cabin",
     "embarked",
     "boat",
     "home.dest"

   FROM
 "public"."titanic")
VERTICAPY_SUBTABLE
   WHERE (boat IS NOT NULL))
VERTICAPY_SUBTABLE

8. Maximize your resources#

Large datasets often contain hundreds of columns. These datasets require VerticaPy to compute many concurrent, resource-intensive aggregations. To limit the impact of these aggregations, you can control the number of queries that VerticaPy sends to the system, which allows for some useful optimizations.

In the following example, we’ll explore a couple of these optimizations. First, generate a dataset:

[21]:

from verticapy.datasets import gen_dataset

vp.drop("public.test_dataset", method= "table") # drop an existing table with the same schema and name
features_ranges = {}
for i in range(20):
    features_ranges[f"x{i}"] = {"type": float, "range": [0, 1]}
vp.drop("test_dataset", method = "table")
vdf = gen_dataset(features_ranges, nrows = 100000).to_db("test_dataset",
                                                        relation_type = "table",
                                                        inplace = True)
display(vdf)

	123 x0 Float(22)	123 x1 Float(22)	123 x2 Float(22)	123 x3 Float(22)	123 x4 Float(22)	123 x5 Float(22)	123 x6 Float(22)	123 x7 Float(22)	123 x8 Float(22)	123 x9 Float(22)	123 x10 Float(22)	123 x11 Float(22)	123 x12 Float(22)	123 x13 Float(22)	123 x14 Float(22)	123 x15 Float(22)	123 x16 Float(22)	123 x17 Float(22)	123 x18 Float(22)	123 x19 Float(22)
1	7.07397703081369e-05	0.0787740496452898	0.115318223834038	0.675260862102732	0.454256983939558	0.612441120436415	0.710587008856237	0.656174514908344	0.929624147713184	0.794853629544377	0.287497493438423	0.693544203415513	0.208437475375831	0.756020798813552	0.772001166362315	0.892991768661886	0.248670872766525	0.491525070974603	0.66170282359235	0.0471792332828045
2	8.221878670156e-05	0.664888116065413	0.083972045686096	0.568629627348855	0.572966161184013	0.783750915434211	0.560252866242081	0.317204114282504	0.0637373190838844	0.134130723774433	0.458569357637316	0.789060909301043	0.694081055233255	0.684782675467432	0.728241338860244	0.148084834218025	0.675027774646878	0.22694622608833	0.677556065144017	0.687195596750826
3	0.000162084121257067	0.666780518833548	0.471131122438237	0.489033610327169	0.413613711716607	0.450382014503703	0.721551714930683	0.912379769841209	0.475080049596727	0.786981134908274	0.940750791225582	0.50642477418296	0.653000021819025	0.257536737946793	0.351418826729059	0.356383176054806	0.442061832174659	0.544653700198978	0.933788935421035	0.261729783844203
4	0.000172695610672235	0.538448342122138	0.0114666193258017	0.15314542199485	0.68178574857302	0.322206106502563	0.824309630319476	0.632614301284775	0.408781875390559	0.71936245681718	0.21350727416575	0.662019907031208	0.146308450959623	0.537607988808304	0.416217104066163	0.226778261363506	0.550271776737645	0.440038891974837	0.906709315488115	0.630714946193621
5	0.000215996988117695	0.440047195181251	0.403162910602987	0.649770956486464	0.759415340377018	0.6369431510102	0.662149624433368	0.944483974715695	0.411118406802416	0.506381524959579	0.968799884431064	0.93660238571465	0.15649419859983	0.567954865284264	0.0799761395901442	0.285689088748768	0.810797542799264	0.125172117492184	0.63526348141022	0.690739138750359
6	0.000235316576436162	0.731702069053426	0.0454661028925329	0.182908847462386	0.778532564872876	0.265417562564835	0.55734150740318	0.482840053969994	0.19379152241163	0.794033095473424	0.347327131545171	0.989177622832358	0.593036794103682	0.133990262635052	0.241174938390031	0.747349292039871	0.509102256270126	0.427678616018966	0.392592984251678	0.0654848476406187
7	0.000279249157756567	0.28437186894007	0.641281842254102	0.381525793578476	0.925038357963786	0.335285265231505	0.774883880047128	0.520290001993999	0.890124120982364	0.867110275430605	0.598176734521985	0.916347541147843	0.192046095384285	0.0858447118662298	0.217270190594718	0.097629978787154	0.756692398572341	0.00557059771381319	0.850608402863145	0.454450654564425
8	0.000304699409753084	0.633377789054066	0.492939325980842	0.203626787988469	0.942555516026914	0.657857748912647	0.605496536241844	0.626292253145948	0.931696797255427	0.98450671765022	0.49190244753845	0.628941232338548	0.89045607461594	0.436812720261514	0.561929893679917	0.297366416547447	0.152835283894092	0.267463113879785	0.86299324571155	0.923588463105261
9	0.000328358961269259	0.867208939278498	0.269665719009936	0.373476926004514	0.203031167155132	0.123003569431603	0.463740748818964	0.24328449065797	0.771818584762514	0.56956998561509	0.876726258080453	0.876385713694617	0.881894226418808	0.420320559293032	0.657797486986965	0.412749577080831	0.447113524423912	0.211105825845152	0.667003343580291	0.43986318516545
10	0.000365900108590722	0.596844155108556	0.674576356075704	0.91947619151324	0.379827807657421	0.899701782036573	0.52025105827488	0.809916381258518	0.473883915692568	0.213705440284684	0.2360272067599	0.139547993894666	0.0802076722029597	0.34740579710342	0.0541830838192254	0.745313023449853	0.548075758852065	0.31719421944581	0.919335188576952	0.320289980620146
11	0.000394462607800961	0.209307472920045	0.327199092600495	0.924387450329959	0.0160209310706705	0.753760276595131	0.822840189794078	0.49680331395939	0.171545246150345	0.251965128816664	0.793102910742164	0.710393627872691	0.232678772183135	0.347755238879472	0.759084188845009	0.42839247873053	0.673935937695205	0.527126732748002	0.106555996229872	0.263772482052445
12	0.000468057114630938	0.460981675423682	0.403460169443861	0.12962389853783	0.823186951223761	0.571866973536089	0.309102889616042	0.176998361479491	0.766989024821669	0.705847310135141	0.528325141873211	0.83062529982999	0.184090153779835	0.247428989969194	0.306534480070695	0.58936817990616	0.365556390490383	0.834892790066078	0.561741274083033	0.0581626649945974
13	0.000490116886794567	0.928313683951274	0.582617239793763	0.689145977143198	0.0357485278509557	0.316650076536462	0.538284422829747	0.545583179220557	0.500751954969019	0.152344487141818	0.291996974730864	0.648616448743269	0.237665006425232	0.426251831464469	0.66277192835696	0.976468745619059	0.84758166805841	0.323237158358097	0.610974005656317	0.179504566593096
14	0.000559100415557623	0.646318433340639	0.794918233528733	0.130381848430261	0.229351825080812	0.42663979367353	0.671490587526932	0.787137030856684	0.146217171568424	0.996617547236383	0.29423179384321	0.116407420719042	0.478859937051311	0.674083475023508	0.775861297501251	0.29623837932013	0.0322002198081464	0.979507745243609	0.54247380211018	0.165981483878568
15	0.000813818536698818	0.401159638306126	0.122791942209005	0.0811491112690419	0.371458741603419	0.989593782229349	0.171249753562734	0.422167409444228	0.636636234121397	0.39338878239505	0.931037221569568	0.589330062037334	0.739032665267587	0.717970880214125	0.291829257272184	0.525500290794298	0.824449382722378	0.117800996638834	0.27686498942785	0.539769044611603
16	0.000881037674844265	0.926563451532274	0.49403636623174	0.606777261244133	0.770239162025973	0.922334840986878	0.293788638431579	0.659072834998369	0.396147989900783	0.557868031784892	0.579218808561563	0.556756211211905	0.767120233271271	0.460390074644238	0.558031447697431	0.698243172373623	0.0655963046010584	0.241276160581037	0.0979691406246275	0.739939774386585
17	0.000904147746041417	0.00713742384687066	0.887279984075576	0.403491991572082	0.530341457109898	0.610861517954618	0.33497479185462	0.611500513739884	0.276584282051772	0.43097455217503	0.338759421836585	0.985681066755205	0.658270600251853	0.500626073218882	0.0287806298583746	0.244763401104137	0.493146955734119	0.358823155518621	0.558665160322562	0.199766838224605
18	0.000911583891138434	0.773990652291104	0.920283103827387	0.108230599202216	0.382268040673807	0.643023893935606	0.881540811387822	0.813924036454409	0.713228946086019	0.218545480864123	0.685787370195612	0.835733655141667	0.67833538283594	0.838469396345317	0.135901709320024	0.13111672597006	0.758714649826288	0.463491910602897	0.61335618281737	0.174273746088147
19	0.000941727310419083	0.00206689559854567	0.746966916369274	0.783018058631569	0.792191141750664	0.933956834953278	0.705415580421686	0.466654091374949	0.196242459118366	0.364395431708544	0.996139663970098	0.817020452581346	0.256331377429888	0.323085545096546	0.737441817997023	0.324295812053606	0.204035287722945	0.78107621264644	0.370023408904672	0.299085040576756
20	0.00103024975396693	0.428627724293619	0.78149398136884	0.333310010610148	0.471324097597972	0.178805551957339	0.165625331224874	0.0999236232601106	0.497906468110159	0.478547443402931	0.792894101468846	0.409282827749848	0.348093420499936	0.700531512033194	0.509646830381826	0.943777309497818	0.96923285885714	0.478293595835567	0.255895378766581	0.0177682114299387
21	0.00108601152896881	0.941462184302509	0.272163378773257	0.185871017165482	0.509520422667265	0.462705116253346	0.770795213058591	0.855878246715292	0.0998670184053481	0.165537059539929	0.507003561593592	0.31151592801325	0.965856424067169	0.0487796613015234	0.197327432688326	0.204643638804555	0.971329546999186	0.168989388272166	0.939109412254766	0.722158912802115
22	0.00108695239759982	0.777154640993103	0.620155370794237	0.809243672993034	0.7193431945052	0.237389110261574	0.308336935704574	0.828376793535426	0.247933047590777	0.219911796972156	0.819004198303446	0.379911294672638	0.529248681152239	0.899906970094889	0.473881952930242	0.800273945787922	0.243586545577273	0.949731273576617	0.445705038495362	0.841658351710066
23	0.00115349795669317	0.120015754830092	0.916747121373191	0.605290731182322	0.200641039758921	0.834948729723692	0.086277878144756	0.129435950890183	0.568009133217856	0.383186249760911	0.737167516024783	0.809659451013431	0.709416666766629	0.588457584148273	0.955320416018367	0.715534314513206	0.0272180251777172	0.748523120302707	0.741011689417064	0.817204979015514
24	0.00124836689792573	0.404006673255935	0.729188876226544	0.134905292652547	0.638447147095576	0.0326598910614848	0.875825939932838	0.518126977607608	0.284011701121926	0.944475379772484	0.270062699448317	0.455564816715196	0.0699151025619358	0.779402833431959	0.61873278208077	0.868116568308324	0.522123059723526	0.265373416943476	0.961012411862612	0.252335847122595
25	0.00161040597595274	0.797531974501908	0.727953300112858	0.620368633652106	0.827185711590573	0.966398877091706	0.14467933960259	0.285843878984451	0.482884235680103	0.565771921304986	0.586744515225291	0.139016248518601	0.64823877508752	0.18260705890134	0.544085418339819	0.159320603357628	0.845786469988525	0.401554244570434	0.58950429642573	0.828654535580426
26	0.00180388195440173	0.155339254299179	0.792725794017315	0.756788194878027	0.445429037325084	0.163453655783087	0.529277089051902	0.855168510926887	0.265460407128558	0.0713953387457877	0.107084215851501	0.212133470689878	0.862828412791714	0.729005594039336	0.958629258209839	0.977173406165093	0.323138889623806	0.150734570110217	0.663891891017556	0.884100564057007
27	0.00186728429980576	0.746374151203781	0.316366038983688	0.997330937534571	0.748660784447566	0.114794928114861	0.623178282752633	0.574027118971571	0.408452668460086	0.848881515674293	0.431748547125608	0.542989214183763	0.588508885120973	0.440957642160356	0.0172715813387185	0.63088421523571	0.270692675141618	0.844324083533138	0.279879358131438	0.325400947825983
28	0.00187112367711961	0.074215796077624	0.209501659730449	0.87649839790538	0.560154906939715	0.132593187736347	0.810229558730498	0.964394895127043	0.414751750649884	0.404893786180764	0.18348527001217	0.0345770684070885	0.0740700317546725	0.36832527210936	0.671524447156116	0.046171389054507	0.553925870917737	0.738648246275261	0.161384960170835	0.615539412479848
29	0.00195040018297732	0.8087735674344	0.901627759216353	0.477817892096937	0.212713369168341	0.307337669888511	0.0883674265351146	0.624326047720388	0.99146943888627	0.0235797860659659	0.362047810573131	0.708877760451287	0.964386200532317	0.375861238921061	0.869514229008928	0.27065479522571	0.816889233421534	0.29524426581338	0.409676425158978	0.599162093829364
30	0.00209484249353409	0.987763728713617	0.84027284453623	0.212210223544389	0.45631747180596	0.134294645627961	0.00533370021730661	0.55260439356789	0.789927584584802	0.0972621727269143	0.0962308479938656	0.471911519533023	0.344868397107348	0.947255346924067	0.330723610939458	0.416563589125872	0.680603050161153	0.552643134025857	0.128044051351026	0.571059670532122
31	0.00214434368535876	0.587025278480724	0.358269715216011	0.855862671742216	0.142912609502673	0.656568002188578	0.131265196716413	0.305069310357794	0.655289984773844	0.558698717271909	0.608658290002495	0.557965404121205	0.546369242016226	0.0497625421267003	0.155141055351123	0.575387650635093	0.649278603261337	0.406251986511052	0.146564882015809	0.0264485760126263
32	0.0021580820903182	0.0582028334029019	0.908730320865288	0.197044902713969	0.181442409753799	0.117233639582992	0.707306296564639	0.0332066854462028	0.733929901616648	0.313547425437719	0.567457454046234	0.252109538763762	0.476462761405855	0.747679452644661	0.394215121399611	0.7573907522019	0.522556410171092	0.577653749613091	0.215456620790064	0.139164109481499
33	0.00217105797491968	0.0717994018923491	0.846076355781406	0.613671284168959	0.285275529371575	0.887913216836751	0.204220998333767	0.0293521634303033	0.767697215545923	0.309881044086069	0.457838901784271	0.0218484336510301	0.562187176896259	0.94680851069279	0.995543398428708	0.791082461364567	0.0664631684776396	0.241565929725766	0.407018105033785	0.727160071954131
34	0.00234021944925189	0.36967738554813	0.63896293239668	0.457029976416379	0.474535513203591	0.717951817903668	0.389928099699318	0.843624292174354	0.890886607812718	0.44873271859251	0.159542276524007	0.537849633721635	0.526401352602988	0.320445043733343	0.912187336012721	0.152477400843054	0.585033761803061	0.918251604074612	0.170170170953497	0.31046407786198
35	0.00251702964305878	0.444963669404387	0.433616900350899	0.435525455977768	0.119558988604695	0.45242811832577	0.605638812296093	0.573070871643722	0.955572738079354	0.565208275569603	0.0954436720348895	0.262073143851012	0.407662492478266	0.382697479566559	0.0807600291445851	0.548337502405047	0.0384508804418147	0.780416399473324	0.752766839927062	0.698577198199928
36	0.00252918503247201	0.436820062110201	0.229361047269776	0.690372472628951	0.675582922063768	0.0390350648667663	0.26657758676447	0.351476528681815	0.768657222390175	0.615417398279533	0.231399021344259	0.634081738768145	0.519527753582224	0.499393659178168	0.176333518698812	0.280457898974419	0.0920732717495412	0.424766164505854	0.912588961189613	0.0552232270129025
37	0.00259770872071385	0.596480611944571	0.834001291077584	0.670721296686679	0.980687382165343	0.178607841953635	0.243839301401749	0.719743447611108	0.746492188656703	0.68649411899969	0.682428323430941	0.496295315679163	0.116068713366985	0.10597405442968	0.772418254520744	0.941502522444353	0.903444903902709	0.562685617944226	0.638337094569579	0.818067149724811
38	0.0027940021827817	0.264916215091944	0.958574424264953	0.0764231239445508	0.375031859846786	0.033055410021916	0.473060915013775	0.557890114374459	0.31443962501362	0.406700508901849	0.857767235254869	0.0298969331197441	0.27419364429079	0.918617191258818	0.717283584177494	0.593540908303112	0.321808034554124	0.396097376011312	0.256550882477313	0.683786167064682
39	0.00308672431856394	0.96410476998426	0.266360730864108	0.698160485131666	0.848827059613541	0.0606714021414518	0.0356115337926894	0.865328075597063	0.98101768665947	0.333825406618416	0.518536287359893	0.661198504036292	0.53448177408427	0.333366481587291	0.629202526295558	0.877430620603263	0.518266425700858	0.739694599527866	0.446443422697484	0.372738287318498
40	0.0031309612095356	0.786174891516566	0.877005259972066	0.0751339443959296	0.831889817025512	0.91450317716226	0.791548782028258	0.338991114171222	0.00389480870217085	0.830166463973001	0.14821662590839	0.775779550196603	0.576553233899176	0.699180886149406	0.939773844787851	0.668083934579045	0.896093739662319	0.431803109357134	0.124544948572293	0.574425296392292
41	0.00320252543315291	0.317497017327696	0.00492033758200705	0.251716795610264	0.238280158024281	0.411784172058105	0.445523543981835	0.863160006003454	0.10830110963434	0.253489604685456	0.637277951929718	0.0664254741277546	0.735603784443811	0.135514754336327	0.17075143288821	0.382505330489948	0.159300432307646	0.778517118655145	0.229798292508349	0.412966939387843
42	0.00344839505851269	0.41365549270995	0.262566076591611	0.455643562600017	0.895506259985268	0.00208698818460107	0.793284616200253	0.852926962310448	0.93254932994023	0.0426197666674852	0.7385973574128	0.0627716386225075	0.778598168864846	0.423927762079984	0.537435690872371	0.628365884535015	0.667497032089159	0.149428114760667	0.0422821058891714	0.239791556727141
43	0.00355346943251789	0.478814508067444	0.870360965607688	0.507097395602614	0.50485321925953	0.169637580402195	0.901601722231135	0.988537980243564	0.251347888726741	0.610513827065006	0.391642059432343	0.9173439797014	0.0930503366980702	0.658149466617033	0.772542469436303	0.144367508357391	0.298528500366956	0.0363043553661555	0.872198846191168	0.220158252399415
44	0.00359765719622374	0.928872890537605	0.99316005827859	0.795961018186063	0.495093082310632	0.3891280235257	0.0468005982693285	0.113113822648302	0.154580353526399	0.72520478349179	0.91036273073405	0.790702533908188	0.388680992182344	0.426584872417152	0.744255352765322	0.683546578045934	0.765603640582412	0.695656835800037	0.283917827066034	0.419625357026234
45	0.00360921234823763	0.531842291587964	0.724728477420285	0.609136287122965	0.446178850485012	0.453919425141066	0.923414060845971	0.113790378905833	0.72343031829223	0.0966935798060149	0.0734577272087336	0.0768244755454361	0.276032139081508	0.281680727144703	0.617480019340292	0.0339551146607846	0.944291460094973	0.251266735838726	0.23711613798514	0.634889496024698
46	0.00365079543553293	0.817615391453728	0.520933826686814	0.226314679253846	0.505093523301184	0.558282470330596	0.30954686878249	0.123459958937019	0.568857222562656	0.437137285014614	0.692113322671503	0.890789003809914	0.40133009897545	0.522017363924533	0.489631124539301	0.0996344615705311	0.672558885533363	0.202536996454	0.0133494408801198	0.536575390025973
47	0.00374495936557651	0.505060694180429	0.509272237773985	0.178100888384506	0.782253640936688	0.826623527798802	0.785702407127246	0.1362720427569	0.361139205284417	0.724397498648614	0.0370221026241779	0.476182386046275	0.40956871653907	0.8456665428821	0.79963996540755	0.7550399673637	0.209125438239425	0.762227368541062	0.337561910739169	0.912123567424715
48	0.00377884856425226	0.967814054107293	0.969440899323672	0.881649831542745	0.690031155245379	0.865835349773988	0.122613899409771	0.902668681927025	0.0748221722897142	0.924740245565772	0.880305877653882	0.299106487073004	0.467194598400965	0.390335363335907	0.38866821792908	0.604087451007217	0.195586528396234	0.952861298574135	0.0704811403993517	0.832441444741562
49	0.00381363835185766	0.0941376029513776	0.574694714508951	0.550449605798349	0.0386752742342651	0.234482721425593	0.382234157295898	0.607023980468512	0.884521545143798	0.872739625629038	0.977535909740254	0.913787788711488	0.359378568362445	0.949293674901128	0.0819845686201006	0.986360437003896	0.70509942015633	0.760160259436816	0.109000823227689	0.112887593684718
50	0.00397079810500145	0.212073437171057	0.48211749875918	0.883684633066878	0.397640393581241	0.487227794481441	0.00920038437470794	0.502992056543007	0.373318474041298	0.406748903682455	0.0770342068281025	0.924326147418469	0.476043978240341	0.308995440835133	0.354535515420139	0.0424741832539439	0.057277743704617	0.988012937130406	0.511029038811103	0.413184882607311
51	0.00403795694001019	0.619252756470814	0.0578359025530517	0.143490813905373	0.236446519149467	0.988320326898247	0.385204897960648	0.502203524112701	0.737451476510614	0.0810965462587774	0.311949507100508	0.125118662603199	0.19885065429844	0.850292703602463	0.590110443066806	0.980088412063196	0.22886746446602	0.449672502931207	0.321722096763551	0.237496974412352
52	0.0041802495252341	0.377575985854492	0.172844951972365	0.0709214166272432	0.545118930051103	0.386355616385117	0.221694513922557	0.0383596925530583	0.496467129327357	0.445088009815663	0.557745777070522	0.693855358986184	0.0805488636251539	0.546723618870601	0.176208014367148	0.187749891774729	0.220800093840808	0.515505586052313	0.781808987725526	0.0498490098398179
53	0.00425751856528223	0.0841640983708203	0.0387657668907195	0.77894954639487	0.0118265384808183	0.713090416975319	0.458025665255263	0.74047913798131	0.863210616167635	0.521472702268511	0.810110576450825	0.753110131248832	0.0128105990588665	0.970738404430449	0.805698901647702	0.353794576367363	0.799284345470369	0.878446194343269	0.884330089669675	0.939427450764924
54	0.00426093093119562	0.153547666734084	0.381534192012623	0.321758558508009	0.427915301173925	0.613634313922375	0.654181690886617	0.417538502486423	0.454630892723799	0.965851980028674	0.336711104959249	0.853823249228299	0.268140280153602	0.47203312558122	0.863993570674211	0.51130635663867	0.593586348230019	0.884292727103457	0.370439133606851	0.706301348749548
55	0.00428508990444243	0.105298783397302	0.644706329563633	0.659174422733486	0.6894555632025	0.72703453223221	0.255143147427589	0.0582269141450524	0.00146635784767568	0.0212153398897499	0.225922800833359	0.317763218889013	0.275477467104793	0.480049854610115	0.0977867199108005	0.686037741834298	0.758867823053151	0.0199273058678955	0.848614445654675	0.805880650877953
56	0.00431479583494365	0.0424919822253287	0.0502910723444074	0.596376655157655	0.57194370473735	0.892794558545575	0.184718130156398	0.787483715685084	0.990803169785067	0.590495761483908	0.464788358192891	0.0921832083258778	0.634097950067371	0.830170949688181	0.0253718644380569	0.535787401953712	0.20451093907468	0.512824383564293	0.0716390665620565	0.93218103912659
57	0.00433446536771953	0.524380714166909	0.970265563344583	0.359313814900815	0.413442326942459	0.0628430843353271	0.0732088701333851	0.694009973201901	0.201827929588035	0.747698707506061	0.0370512225199491	0.375221759313717	0.351535983383656	0.262213723035529	0.966945725027472	0.0881411829032004	0.0983525062911212	0.873808739474043	0.894839011598378	0.327558801043779
58	0.00446653924882412	0.843536542961374	0.105364711023867	0.118160751182586	0.940603643190116	0.0749663456808776	0.479384962236509	0.243099185172468	0.496906295185909	0.294673375086859	0.686775227077305	0.990222433349118	0.740034226560965	0.670642849756405	0.406523589743301	0.843384861247614	0.672727943630889	0.165835064137354	0.746754412306473	0.965748598799109
59	0.0046022180467844	0.956666482146829	0.0805297328624874	0.229771366342902	0.251617633504793	0.388786412077025	0.464746689423919	0.904225347330794	0.289262931328267	0.744157522683963	0.276841879123822	0.74820623616688	0.603038421133533	0.233964096521959	0.0740005511324853	0.140686566010118	0.141128469491377	0.938526571728289	0.458582968218252	0.587838252540678
60	0.00461160345003009	0.473813735647127	0.581228081602603	0.00370162003673613	0.459114912664518	0.711867091944441	0.996595837175846	0.621184524148703	0.35400984599255	0.0304948415141553	0.616285625379533	0.000279130646958947	0.1245759264566	0.406890911050141	0.310492779361084	0.804401013534516	0.774532619165257	0.114202772267163	0.703635329380631	0.452991388272494
61	0.00461732968688011	0.465384542476386	0.393297479255125	0.848218442639336	0.535970466677099	0.343899488681927	0.258560408139601	0.154349931515753	0.0816676099784672	0.710767717799172	0.381413764320314	0.38638926227577	0.282477381639183	0.946939909132197	0.294280760455877	0.921165406471118	0.503142500761896	0.432406621286646	0.325963022653013	0.356749014696106
62	0.00476845540106297	0.258106101769954	0.547472671139985	0.0121281503234059	0.944041285663843	0.221797088161111	0.477063444443047	0.525213184999302	0.982539091259241	0.590724674984813	0.731559418141842	0.0803768467158079	0.00512704392895103	0.564750900724903	0.138684183359146	0.346762980334461	0.936325509333983	0.94501115847379	0.565963673871011	0.942184565588832
63	0.004947854205966	0.419056745478883	0.513505916809663	0.325571444351226	0.0782336033880711	0.929112715879455	0.229773054365069	0.273874794598669	0.397375925444067	0.603688856586814	0.188717195065692	0.757374514825642	0.672073962166905	0.693470990518108	0.511072629829869	0.777680364670232	0.951613001525402	0.881405857158825	0.825731489108875	0.516175739932805
64	0.00538333901204169	0.322298016864806	0.69410336879082	0.684482586570084	0.477510158903897	0.119516379898414	0.30468803527765	0.399920470546931	0.266370635014027	0.325769947841763	0.0839270846918225	0.684469408355653	0.400978527730331	0.401593925664201	0.778954172041267	0.133627883624285	0.0807626605965197	0.380436617648229	0.734921377385035	0.674469094490632
65	0.00545138143934309	0.306729759089649	0.798511869041249	0.091041088104248	0.145995727041736	0.953173319343477	0.308144008275121	0.869339386699721	0.295038224430755	0.65782743319869	0.495536305941641	0.0872459651436657	0.918030107626691	0.508855571504682	0.837491150246933	0.949852778343484	0.588339598383754	0.1269914903678	0.557877960149199	0.950301753124222
66	0.00546523556113243	0.953895641723648	0.624990389449522	0.914164969464764	0.924697964452207	0.724742524325848	0.880699387285858	0.407653589267284	0.190742693608627	0.151187947252765	0.235000546323135	0.253256030147895	0.530698469374329	0.324303332949057	0.200028688181192	0.456586898537353	0.848966627381742	0.0786357559263706	0.386355879483745	0.205620076041669
67	0.00546850450336933	0.283498687902465	0.441795768914744	0.32221887097694	0.287337142275646	0.209364713868126	0.881016029277816	0.384326739935204	0.94835768151097	0.859553740359843	0.458966447971761	0.670883520739153	0.238765803398564	0.361801155842841	0.180179555667564	0.0272510524373502	0.369755230844021	0.947866276139393	0.747621705755591	0.136986246798187
68	0.00548187806271017	0.475841291481629	0.527673712233081	0.0450103317853063	0.884271819610149	0.778834163676947	0.623733769403771	0.363304463215172	0.874521491583437	0.429681353503838	0.253472779644653	0.580703588901088	0.655549772782251	0.183050571475178	0.802675958489999	0.188729084795341	0.352447766344994	0.740464653354138	0.740104211727157	0.0518087858799845
69	0.00549932359717786	0.241337351268157	0.299286941299215	0.519512358820066	0.83725765440613	0.272787029854953	0.556820150464773	0.0122388352174312	0.515847133472562	0.702381841372699	0.670392811764032	0.825438600499183	0.106037738500163	0.564345984021202	0.711919201072305	0.642084101215005	0.135255376575515	0.397315983427688	0.0112924783024937	0.618799487128854
70	0.00553798023611307	0.607726965565234	0.985878179082647	0.00235681794583797	0.0489395488984883	0.989479569485411	0.294360062573105	0.778990892926231	0.79330208315514	0.0719872878398746	0.647720644949004	0.782589750830084	0.28620068449527	0.10729380720295	0.0940630629193038	0.256561677670106	0.598521643783897	0.198220360325649	0.949472333770245	0.823433440411463
71	0.00560050434432924	0.841474473942071	0.728313690749928	0.924416612135246	0.407194586470723	0.990037083858624	0.199075036449358	0.487947125919163	0.467831827700138	0.315016770502552	0.657835377147421	0.0199113448616117	0.208919802913442	0.754592893877998	0.0829503959976137	0.628889914602041	0.639702387386933	0.558789229718968	0.0484869664069265	0.456474096514285
72	0.00601385021582246	0.250359248835593	0.0670082157012075	0.625505827134475	0.294753764057532	0.420391386374831	0.614196145208552	0.160346552031115	0.104703170713037	0.78440397977829	0.65477497712709	0.320233120117337	0.35102158319205	0.40943976980634	0.56285482365638	0.440241666277871	0.682416625088081	0.921723316423595	0.291479872306809	0.103472447255626
73	0.00604493683204055	0.0837486267555505	0.601602577138692	0.398213956970721	0.0931473581586033	0.757679572561756	0.560057109920308	0.941156754503027	0.429673553910106	0.546325878472999	0.142156403046101	0.190528352046385	0.311363452812657	0.768675104482099	0.311363106127828	0.805618630489334	0.55748757510446	0.441408412763849	0.871549373492599	0.0695229934062809
74	0.0060453072655946	0.300722937099636	0.790151097578928	0.20636883424595	0.741532553685829	0.781886185286567	0.143261167453602	0.446249913191423	0.847585078328848	0.146760326111689	0.593940370483324	0.275638417573646	0.811724295141175	0.960387832950801	0.293531824136153	0.933695522136986	0.298225078498945	0.362555309897289	0.265204551629722	0.072734983637929
75	0.00618986273184419	0.805613982258365	0.0422782835084945	0.76519419089891	0.747756065335125	0.153556932462379	0.903475366299972	0.789524552877992	0.124038710491732	0.504345828667283	0.0374226877465844	0.578413485083729	0.7855708100833	0.618723407853395	0.49429188342765	0.660120268817991	0.956698497291654	0.446374538121745	0.827721469337121	0.965238370699808
76	0.00622922414913774	0.754714199807495	0.240874887211248	0.846543611260131	0.416060121031478	0.501765287714079	0.493345912080258	0.932234370615333	0.998549460433424	0.537886969046667	0.986069408943877	0.910013730404899	0.196587845683098	0.59711445402354	0.923152100061998	0.905041039222851	0.155827670125291	0.235326408175752	0.637032742612064	0.500663405284286
77	0.00656199036166072	0.4190622207243	0.92339661391452	0.240387413650751	0.777029192773625	0.182897685095668	0.839252425124869	0.149391439976171	0.918917854083702	0.101920424494892	0.792602011933923	0.822929354617372	0.35380161460489	0.427825963124633	0.883498140610754	0.808662811294198	0.99868420092389	0.393379490356892	0.790347108384594	0.678845811402425
78	0.00664348667487502	0.00459358072839677	0.546572857536376	0.408510788576677	0.652300629299134	0.0109782852232456	0.148990806657821	0.606702149379998	0.692402821267024	0.595140707446262	0.624587081372738	0.419066367670894	0.732320094015449	0.359192698262632	0.359762549865991	0.0579218443017453	0.756588675314561	0.310564984101802	0.11532178753987	0.183660146314651
79	0.00678812246769667	0.362700945232064	0.359502874547616	0.515522132860497	0.0348010822199285	0.95595286344178	0.304244012804702	0.87255814531818	0.702596328221262	0.00732112652622163	0.639413375407457	0.366068651201203	0.255742775509134	0.0684890409465879	0.824061867315322	0.447865814436227	0.491864521987736	0.391872995300218	0.25138606200926	0.632435228675604
80	0.00685653323307633	0.541366507299244	0.854594117030501	0.0784832155331969	0.168198704021052	0.120467924745753	0.91619707737118	0.591781296068802	0.528874227078632	0.425473352195695	0.799769802950323	0.325179634150118	0.751347981160507	0.430765189696103	0.310842133359984	0.955887380521744	0.197660892037675	0.70256233564578	0.100242212880403	0.237061544088647
81	0.00685829785652459	0.0737346669193357	0.41138032078743	0.0698829633183777	0.608548302203417	0.71946886042133	0.80464562936686	0.945206404430792	0.158339914400131	0.938952728407457	0.251120764762163	0.20651724236086	0.919362138025463	0.41960033075884	0.114274583756924	0.91738243913278	0.82050456199795	0.974134010728449	0.62476803874597	0.238603635458276
82	0.00707517890259624	0.870870778337121	0.289357194909826	0.259514328325167	0.604267461923882	0.0684568914584816	0.506292749429122	0.841520687332377	0.452349721454084	0.812683543656021	0.245138693833724	0.605419112136588	0.808126242598519	0.352228943957016	0.35480533237569	0.00231021549552679	0.0358688889537007	0.0878025544807315	0.336205598665401	0.73932251194492
83	0.00717375380918384	0.0342054225038737	0.80758727598004	0.183317706221715	0.136540611973032	0.256744920974597	0.141275044297799	0.9284223979339	0.493453248403966	0.735569558804855	0.416106038028374	0.604572874028236	0.0248288377188146	0.582600901369005	0.846532450523227	0.14065939700231	0.54291882016696	0.717121104476973	0.0438175313174725	0.820026016095653
84	0.00729084201157093	0.785489923553541	0.445220056222752	0.55330244358629	0.548337149433792	0.481341941747814	0.0649395957589149	0.311612937599421	0.463683495763689	0.628922906238586	0.556117800995708	0.347487089922652	0.408390263561159	0.171398841077462	0.726001610048115	0.677249144762754	0.357470812974498	0.251090686069801	0.586853001732379	0.88893955363892
85	0.00749766733497381	0.696531714871526	0.422463096445426	0.180286979069933	0.353188456967473	0.848636371549219	0.679648281307891	0.311712359311059	0.265766488853842	0.604466279037297	0.787453394616023	0.933836678508669	0.449902335647494	0.710774171166122	0.350267749978229	0.413002075627446	0.0626454949378967	0.143665216630325	0.150194650283083	0.804719494190067
86	0.00751169468276203	0.285741023952141	0.338663467671722	0.494153538020328	0.202513837721199	0.654696987941861	0.46396286203526	0.819498291704804	0.938656349666417	0.477913737762719	0.411398720229045	0.73476783791557	0.835594545118511	0.175678620580584	0.199021044652909	0.631595261162147	0.408487740904093	0.804965088609606	0.28771815309301	0.328727405751124
87	0.00752854882739484	0.913574005942792	0.336505033541471	0.179638955974951	0.968867269344628	0.158892794046551	0.480028193444014	0.512894966639578	0.85756973689422	0.698473029537126	0.823700107401237	0.0390946841798723	0.285054660402238	0.412629747530445	0.733085129177198	0.198345998302102	0.646600055508316	0.725847140420228	0.646383410086855	0.0498906851280481
88	0.00757219130173326	0.0109403633978218	0.0633918107487261	0.479443948948756	0.545709404861555	0.37883112533018	0.104878830956295	0.972151780733839	0.961714592529461	0.387986887944862	0.994966646423563	0.715012297499925	0.881253282073885	0.777751576155424	0.0354795877356082	0.212195488158613	0.926689445739612	0.660283888923004	0.339678987627849	0.471689041471109
89	0.00766970543190837	0.527284760028124	0.19934118911624	0.921548759331927	0.0793959787115455	0.683287604479119	0.0840198057703674	0.118231760570779	0.611601852346212	0.460874487645924	0.166486342903227	0.425509550375864	0.550320131471381	0.41487300908193	0.520359064685181	0.489040250424296	0.608120192075148	0.69959832332097	0.153663606382906	0.619109965860844
90	0.0076968097127974	0.348148304503411	0.487328805960715	0.10420181369409	0.926203251583502	0.836175340460613	0.436845532851294	0.96538327424787	0.794726418796927	0.634292856557295	0.796583076007664	0.667277628555894	0.550096284830943	0.565524800214916	0.709038597531617	0.354924467392266	0.156575371744111	0.26434097182937	0.164955091197044	0.704713765298948
91	0.00779624003916979	0.624267423991114	0.879793904721737	0.426280555315316	0.0380290842149407	0.75890576466918	0.597809110302478	0.361277499934658	0.814018534030765	0.170196591643617	0.0912573500536382	0.143792857415974	0.362368137808517	0.0403675804845989	0.98333903378807	0.770681650144979	0.607124063419178	0.682219392620027	0.0686333137564361	0.484713949495926
92	0.00779921375215054	0.111018913099542	0.309902536449954	0.610464229714125	0.561885122442618	0.177827055566013	0.903954822104424	0.615223287371919	0.412160600535572	0.271555683808401	0.606007322203368	0.613371159415692	0.409590090857819	0.110419267090037	0.318232621531934	0.17983268154785	0.0855738839600235	0.0786378593184054	0.574028636328876	0.926056883763522
93	0.00794436153955758	0.0866117349360138	0.23199142399244	0.634941894561052	0.787227649940178	0.630225233035162	0.902153197210282	0.199662601342425	0.523980051511899	0.271924757631496	0.79710890725255	0.533140372252092	0.213943782960996	0.376594227040187	0.53774112649262	0.508736013201997	0.488818449899554	0.189685012679547	0.437729346798733	0.292704855790362
94	0.00806893268600106	0.828132682247087	0.177561826771125	0.626462222775444	0.47537758317776	0.924095373833552	0.497223185608163	0.542903080582619	0.557985129300505	0.907707176636904	0.619291658746079	0.441977460170165	0.522619057446718	0.112300214357674	0.531230568652973	0.708021338796243	0.357494516996667	0.901109519181773	0.772741546621546	0.0914568158332258
95	0.00813061115331948	0.137317112181336	0.740909490035847	0.11798402084969	0.437485556351021	0.805863510118797	0.387181208236143	0.182686796411872	0.724204962374642	0.666924481745809	0.579842306906357	0.460731079569086	0.789949133060873	0.97352315695025	0.357460082974285	0.795144100673497	0.0833099172450602	0.630377059336752	0.416395920794457	0.294484910788015
96	0.00814265152439475	0.11787271592766	0.150207003811374	0.927752236602828	0.152769709704444	0.984722716966644	0.033823377918452	0.482468761503696	0.340082741575316	0.650652398355305	0.417405683314428	0.562931626103818	0.479446476558223	0.425017828121781	0.208957926370203	0.820624065119773	0.763077961979434	0.37714247754775	0.690684970468283	0.825754955410957
97	0.00814613536931574	0.131975577911362	0.234567225677893	0.775960420025513	0.0371721654664725	0.715427521150559	0.83360133972019	0.352324346546084	0.983107637614012	0.818627365864813	0.708007084671408	0.240082449745387	0.545934876659885	0.0160056895110756	0.803137894254178	0.453650089679286	0.377719605341554	0.26844239840284	0.338550285203382	0.809020956512541
98	0.00827531493268907	0.276867087231949	0.471320938318968	0.981542524881661	0.648397818440571	0.738290718290955	0.945892376359552	0.351839146111161	0.397414975101128	0.0967028334271163	0.466776972170919	0.419006321113557	0.822865034919232	0.129513618536294	0.698073309846222	0.895824455423281	0.790303804911673	0.718757815193385	0.64297569799237	0.677666405215859
99	0.00829991768114269	0.938895464409143	0.352353760041296	0.12851698580198	0.270959601737559	0.466216505272314	0.10763371665962	0.863899735035375	0.446031970437616	0.318085219711065	0.0716640164609998	0.712500199908391	0.345939709572122	0.261633815709502	0.429479747312143	0.375771029852331	0.00993067282252014	0.0133703623432666	0.468206264078617	0.0531278869602829
100	0.0083282443229109	0.374690069817007	0.919465467566624	0.558314105495811	0.289440111024305	0.455159484408796	0.52975751995109	0.279029932571575	0.276656360831112	0.504809467121959	0.726947123417631	0.442582226125523	0.858213118743151	0.314072647830471	0.598937029251829	0.581030135741457	0.262735865078866	0.0938310839701444	0.212563763605431	0.545174148632213

Rows: 1-100 | Columns: 20

To monitor how VerticaPy is computing the aggregations, use the set_option() function to turn on SQL code generation and turn off cache:

[22]:

vp.set_option("sql_on", True)
vp.set_option("cache", False)

VerticaPy allows you to send multiple queries, either iteratively or concurrently, to the database when computing aggregations.

First, let’s send a single query to compute the average for all columns in the vDataFrame:

[23]:

display(vdf.avg(ncols_block = 20))

Computing the different aggregations.

SELECT
/*+LABEL('vDataframe.aggregate')*/ AVG("x0"),
AVG("x1"),
AVG("x2"),
AVG("x3"),
AVG("x4"),
AVG("x5"),
AVG("x6"),
AVG("x7"),
AVG("x8"),
AVG("x9"),
AVG("x10"),
AVG("x11"),
AVG("x12"),
AVG("x13"),
AVG("x14"),
AVG("x15"),
AVG("x16"),
AVG("x17"),
AVG("x18"),
AVG("x19")
FROM
"public"."test_dataset"
LIMIT 1

	avg
"x0"	0.499510914725466
"x1"	0.500098380795368
"x2"	0.498650639873019
"x3"	0.498249878197482
"x4"	0.498723634853531
"x5"	0.49859746068794
"x6"	0.499440802120219
"x7"	0.500571392295512
"x8"	0.500615366488232
"x9"	0.500191712573576
"x10"	0.500156461340457
"x11"	0.500385406252143
"x12"	0.499548987546798
"x13"	0.498913883469834
"x14"	0.500118768013034
"x15"	0.499257680216099
"x16"	0.499268517902286
"x17"	0.499183649747421
"x18"	0.498663500893123
"x19"	0.498528332974443

Rows: 1-20 | Columns: 2

We see that there was one SELECT query for all columns in the vDataFrame. You can reduce the impact on the system by using the ncols_block parameter to split the computation into multiple iterative queries, where the value of the parameter is the number of columns included in each query.

For example, setting ncols_block to 5 will split the computation, which consists of 20 total columns, into 4 separate queries, each of which computes the average for 5 columns:

[24]:

display(vdf.avg(ncols_block = 5))

  0%|          | 0/4 [00:00<?, ?it/s]

Computing the different aggregations.

SELECT
/*+LABEL('vDataframe.aggregate')*/ AVG("x0"),
AVG("x1"),
AVG("x2"),
AVG("x3"),
AVG("x4")
FROM
"public"."test_dataset"
LIMIT 1

 25%|██▌       | 1/4 [00:00<00:00,  7.57it/s]

Computing the different aggregations.

SELECT
/*+LABEL('vDataframe.aggregate')*/ AVG("x5"),
AVG("x6"),
AVG("x7"),
AVG("x8"),
AVG("x9")
FROM
"public"."test_dataset"
LIMIT 1

 50%|█████     | 2/4 [00:00<00:00,  7.35it/s]

Computing the different aggregations.

SELECT
/*+LABEL('vDataframe.aggregate')*/ AVG("x10"),
AVG("x11"),
AVG("x12"),
AVG("x13"),
AVG("x14")
FROM
"public"."test_dataset"
LIMIT 1

 75%|███████▌  | 3/4 [00:00<00:00,  7.44it/s]

Computing the different aggregations.

SELECT
/*+LABEL('vDataframe.aggregate')*/ AVG("x15"),
AVG("x16"),
AVG("x17"),
AVG("x18"),
AVG("x19")
FROM
"public"."test_dataset"
LIMIT 1

100%|██████████| 4/4 [00:00<00:00,  7.50it/s]

	avg
"x0"	0.499510914725466
"x1"	0.500098380795368
"x2"	0.498650639873019
"x3"	0.498249878197482
"x4"	0.498723634853531
"x5"	0.49859746068794
"x6"	0.499440802120219
"x7"	0.500571392295512
"x8"	0.500615366488232
"x9"	0.500191712573576
"x10"	0.500156461340457
"x11"	0.500385406252143
"x12"	0.499548987546798
"x13"	0.498913883469834
"x14"	0.500118768013034
"x15"	0.499257680216099
"x16"	0.499268517902286
"x17"	0.499183649747421
"x18"	0.498663500893123
"x19"	0.498528332974443

Rows: 1-20 | Columns: 2

In addition to spliting up the computation into separate queries, you can send multiple queries to the database concurrently. You specify the number of concurrent queries with the processes parameter, which defines the number of workers involved in the computation. Each child process creates a DB connection and then sends its query. In the following example, we use 4 ‘processes’:

[25]:

display(vdf.avg(ncols_block = 5, processes = 4))

	avg
"x0"	0.499510914725466
"x1"	0.500098380795368
"x2"	0.498650639873019
"x3"	0.498249878197482
"x4"	0.498723634853531
"x5"	0.49859746068794
"x6"	0.499440802120219
"x7"	0.500571392295512
"x8"	0.500615366488232
"x9"	0.500191712573576
"x10"	0.500156461340457
"x11"	0.500385406252143
"x12"	0.499548987546798
"x13"	0.498913883469834
"x14"	0.500118768013034
"x15"	0.499257680216099
"x16"	0.499268517902286
"x17"	0.499183649747421
"x18"	0.498663500893123
"x19"	0.498528332974443

Rows: 1-20 | Columns: 2

Whether it’s best to send multiple queries iteratively, in parallel, or in one single query depends on your specific use case, and often on the structure of your dataset.