verticapy.vDataFrame.scatter#

Draws the scatter plot of the input vDataColumns.

Parameters#

columns: SQLColumns: List of the vDataColumns names.
by: str, optional: Categorical vDataColumn used to label the data.
size: str: Numerical vDataColumn used to represent the Bubble size.
cmap_col: str, optional: Numerical column used to represent the color map.
max_cardinality: int, optional: Maximum number of distinct elements for ‘by’ to be used as categorical. The less frequent elements are gathered together to create a new category: ‘Others’.
cat_priority: PythonScalar / ArrayLike, optional: ArrayLike list of the different categories to consider when labeling the data using the vDataColumn ‘by’. The other categories are filtered.
max_nb_points: int, optional: Maximum number of points to display.
dimensions: tuple, optional: Tuple of two elements representing the IDs of the PCA’s components. If empty and the number of input columns is greater than 3, the first and second PCA are drawn.
bbox: list, optional: Tuple of 4 elements to delimit the boundaries of the final Plot. It must be similar the following list: [xmin, xmax, ymin, ymax]
img: str, optional: Path to the image to display as background.
chart: PlottingObject, optional: The chart object to plot on.
**style_kwargs: Any optional parameter to pass to the plotting functions.

Returns#

obj: Plotting Object.

Examples#

Note

The below example is a very basic one. For other more detailed examples and customization options, please see Scatter Plots

Let’s begin by importing VerticaPy.

import verticapy as vp

Let’s also import numpy to create a dataset.

import numpy as np

We can create a variable N to fix the size:

N = 30

Let’s generate a dataset using the following data.

data = vp.vDataFrame(
    {
        "category": [np.random.choice(['A','B','C']) for _ in range(N)],
        "x": np.random.normal(5, 1, N),
        "y": np.random.normal(8, 1.5, N),
        "z": np.random.normal(10, 2, N),
    }
)

Below are examples of two types of scatter plots:

data.scatter(columns = ["x", "y"], by = "category")

data.scatter(columns = ["x", "y", "z"])