
Pivot Table¶
General¶
Let’s begin by importing VerticaPy.
import verticapy as vp
Let’s also import numpy to create a random dataset.
import numpy as np
Let’s generate a dataset using the following data.
N = 100 # Number of records
data = vp.vDataFrame({
"category1": [np.random.choice(['A','B','C']) for _ in range(N)],
"category2": [np.random.choice(['D','E']) for _ in range(N)],
"score1": np.random.normal(10, 2, N),
"score2": np.random.normal(5, 1.5, N),
})
In the context of data visualization, we have the flexibility to harness multiple plotting libraries to craft a wide range of graphical representations. VerticaPy, as a versatile tool, provides support for several graphic libraries, such as Matplotlib, Highcharts, and Plotly. Each of these libraries offers unique features and capabilities, allowing us to choose the most suitable one for our specific data visualization needs.

Note
To select the desired plotting library, we simply need to use the set_option function. VerticaPy offers the flexibility to smoothly transition between different plotting libraries. In instances where a particular graphic is not supported by the chosen library or is not supported within the VerticaPy framework, the tool will automatically generate a warning and then switch to an alternative library where the graphic can be created.
Please click on the tabs to view the various graphics generated by the different plotting libraries.
We can switch to using the plotly module.
vp.set_option("plotting_lib", "plotly")
VerticaPy has the capability to calculate comprehensive pivot tables and can also automatically discretize and group numerical features, simplifying the data analysis process.
data.pivot_table(columns = ["category1", "category2"])
We load the VerticaPy chart extension.
%load_ext verticapy.chart
We write the SQL query using Jupyter magic cells.
%%chart -k spider
SELECT category1, category2, COUNT(*) FROM :data GROUP BY 1, 2;
We can switch to using the highcharts module.
vp.set_option("plotting_lib", "highcharts")
VerticaPy has the capability to calculate comprehensive pivot tables and can also automatically discretize and group numerical features, simplifying the data analysis process.
data.pivot_table(columns = ["category1", "category2"])
We load the VerticaPy chart extension.
%load_ext verticapy.chart
We write the SQL query using Jupyter magic cells.
%%chart -k spider
SELECT category1, category2, COUNT(*) FROM :data GROUP BY 1, 2;
We can switch to using the matplotlib module.
vp.set_option("plotting_lib", "matplotlib")
VerticaPy has the capability to calculate comprehensive pivot tables and can also automatically discretize and group numerical features, simplifying the data analysis process.
data.pivot_table(columns = ["category1", "category2"])
Out[3]: <Axes: xlabel='category1', ylabel='category2'>

data.hexbin(columns = ["score1", "score2"])
Out[4]: <Axes: xlabel='score1', ylabel='score2'>

Custom Aggregations¶
Within the VerticaPy framework, you have the flexibility to apply a wide array of aggregation techniques according to your specific analytical needs. This extends to the option of utilizing SQL statements, allowing you to craft custom aggregations that precisely match your data summarization requirements. VerticaPy empowers you with the versatility to aggregate data in the manner that best serves your analytical objectives.
Note
In SQL, aggregations can be computed directly within the input SQL statement, but in Python, the process is a bit different.
General Options
data.pivot_table(columns = ["category1", "category2"], method = "count", of = "score1")