verticapy.vDataFrame.scatter#
- vDataFrame.scatter(columns: str | list[str], by: str | None = None, size: str | None = None, cmap_col: str | None = None, max_cardinality: int = 6, cat_priority: None | bool | float | str | timedelta | datetime | list | ndarray = None, max_nb_points: int = 20000, dimensions: tuple = None, bbox: tuple | None = None, img: str | None = None, chart: PlottingBase | TableSample | Axes | mFigure | Highchart | Highstock | Figure | None = None, **style_kwargs) PlottingBase | TableSample | Axes | mFigure | Highchart | Highstock | Figure #
Draws the scatter plot of the input vDataColumns.
Parameters#
- columns: SQLColumns
List of the vDataColumns names.
- by: str, optional
Categorical vDataColumn used to label the data.
- size: str
Numerical vDataColumn used to represent the Bubble size.
- cmap_col: str, optional
Numerical column used to represent the color map.
- max_cardinality: int, optional
Maximum number of distinct elements for ‘by’ to be used as categorical. The less frequent elements are gathered together to create a new category: ‘Others’.
- cat_priority: PythonScalar / ArrayLike, optional
ArrayLike list of the different categories to consider when labeling the data using the vDataColumn ‘by’. The other categories are filtered.
- max_nb_points: int, optional
Maximum number of points to display.
- dimensions: tuple, optional
Tuple of two elements representing the IDs of the PCA’s components. If empty and the number of input columns is greater than 3, the first and second PCA are drawn.
- bbox: list, optional
Tuple of 4 elements to delimit the boundaries of the final Plot. It must be similar the following list: [xmin, xmax, ymin, ymax]
- img: str, optional
Path to the image to display as background.
- chart: PlottingObject, optional
The chart object to plot on.
- **style_kwargs
Any optional parameter to pass to the plotting functions.
Returns#
- obj
Plotting Object.
Examples#
Note
The below example is a very basic one. For other more detailed examples and customization options, please see Scatter Plots
Let’s begin by importing VerticaPy.
import verticapy as vp
Let’s also import numpy to create a dataset.
import numpy as np
We can create a variable
N
to fix the size:N = 30
Let’s generate a dataset using the following data.
data = vp.vDataFrame( { "category": [np.random.choice(['A','B','C']) for _ in range(N)], "x": np.random.normal(5, 1, N), "y": np.random.normal(8, 1.5, N), "z": np.random.normal(10, 2, N), } )
Below are examples of two types of scatter plots:
2D
3D
data.scatter(columns = ["x", "y"], by = "category")
data.scatter(columns = ["x", "y", "z"])
See also
vDataFrame.
density()
: Density Plot.vDataFrame.
outliers_plot()
: Outliers Plot.