Loading...

Pivot Table

General

Let’s begin by importing VerticaPy.

import verticapy as vp

Let’s also import numpy to create a random dataset.

import numpy as np

Let’s generate a dataset using the following data.

N = 100 # Number of records

data = vp.vDataFrame({
    "category1": [np.random.choice(['A','B','C']) for _ in range(N)],
    "category2": [np.random.choice(['D','E']) for _ in range(N)],
    "score1": np.random.normal(10, 2, N),
    "score2": np.random.normal(5, 1.5, N),
})

In the context of data visualization, we have the flexibility to harness multiple plotting libraries to craft a wide range of graphical representations. VerticaPy, as a versatile tool, provides support for several graphic libraries, such as Matplotlib, Highcharts, and Plotly. Each of these libraries offers unique features and capabilities, allowing us to choose the most suitable one for our specific data visualization needs.

_images/plotting_libs.png

Note

To select the desired plotting library, we simply need to use the set_option function. VerticaPy offers the flexibility to smoothly transition between different plotting libraries. In instances where a particular graphic is not supported by the chosen library or is not supported within the VerticaPy framework, the tool will automatically generate a warning and then switch to an alternative library where the graphic can be created.

Please click on the tabs to view the various graphics generated by the different plotting libraries.

We can switch to using the plotly module.

vp.set_option("plotting_lib", "plotly")

VerticaPy has the capability to calculate comprehensive pivot tables and can also automatically discretize and group numerical features, simplifying the data analysis process.

data.pivot_table(columns = ["category1", "category2"])

We load the VerticaPy chart extension.

%load_ext verticapy.chart

We write the SQL query using Jupyter magic cells.

%%chart -k spider
SELECT category1, category2, COUNT(*) FROM :data GROUP BY 1, 2;

We can switch to using the highcharts module.

vp.set_option("plotting_lib", "highcharts")

VerticaPy has the capability to calculate comprehensive pivot tables and can also automatically discretize and group numerical features, simplifying the data analysis process.

data.pivot_table(columns = ["category1", "category2"])

We load the VerticaPy chart extension.

%load_ext verticapy.chart

We write the SQL query using Jupyter magic cells.

%%chart -k spider
SELECT category1, category2, COUNT(*) FROM :data GROUP BY 1, 2;
Loading....

We can switch to using the matplotlib module.

vp.set_option("plotting_lib", "matplotlib")

VerticaPy has the capability to calculate comprehensive pivot tables and can also automatically discretize and group numerical features, simplifying the data analysis process.

data.pivot_table(columns = ["category1", "category2"])
Out[3]: <Axes: xlabel='category1', ylabel='category2'>
_images/plotting_matplotlib_pivot.png

We load the VerticaPy chart extension.

%load_ext verticapy.chart

We write the SQL query using Jupyter magic cells.

%%chart -k spider
SELECT category1, category2, COUNT(*) FROM :data GROUP BY 1, 2;
_images/plotting_matplotlib_pivot1.png
data.hexbin(columns = ["score1", "score2"])
Out[4]: <Axes: xlabel='score1', ylabel='score2'>
_images/plotting_matplotlib_hexbin.png

Custom Aggregations

Within the VerticaPy framework, you have the flexibility to apply a wide array of aggregation techniques according to your specific analytical needs. This extends to the option of utilizing SQL statements, allowing you to craft custom aggregations that precisely match your data summarization requirements. VerticaPy empowers you with the versatility to aggregate data in the manner that best serves your analytical objectives.

Note

In SQL, aggregations can be computed directly within the input SQL statement, but in Python, the process is a bit different.

General Options

data.pivot_table(columns = ["category1", "category2"], method = "count", of = "score1")