verticapy.machine_learning.vertica.automl.AutoClustering#

class verticapy.machine_learning.vertica.automl.AutoClustering(name: str | None = None, overwrite_model: bool = False, n_cluster: int | None = None, init: Literal['kmeanspp', 'random'] | list | ndarray = 'kmeanspp', max_iter: int = 300, tol: float = 0.0001, use_kprototype: bool = False, gamma: float = 1.0, preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False, 'na_method': 'drop', 'outliers_threshold': 3.0, 'standardize_min_cat': 0}, print_info: bool = True)#

Automatically creates k different groups with which to generalize the data.

Parameters#

name: str, optional

Name of the model.

overwrite_model: bool, optional

If set to True, training a model with the same name as an existing model overwrites the existing model.

n_cluster: int, optional

Number of clusters. If empty, an optimal number of clusters are determined using multiple k-means models.

init: str | list, optional

The method for finding the initial cluster centers.

kmeansppUses the k-means++ method to: initialize the centers. [Only available when use_kprototype

is set to False]
randomRandomly subsamples the data to find: initial centers.

Alternatively, you can specify a list with the initial cluster centers.

max_iter: int, optional

The maximum number of iterations for the algorithm.

tol: float, optional

Determines whether the algorithm has converged. The algorithm is considered converged after no center has moved more than a distance of ‘tol’ from the previous iteration.

use_kprototype: bool, optional

If set to True, the function uses the k-prototypes algorithm instead of k-means. k-prototypes can handle categorical features.

gamma: float, optional

[Only if use_kprototype is set to True] Weighting factor for categorical columns. It determines the relative importance of numerical and categorical attributes.

preprocess_data: bool, optional

If True, the data will be preprocessed.

preprocess_dict: dict, optional

Dictionary to pass to the AutoDataPrep class in order to preprocess the data before clustering.

print_info: bool

If True, prints the model information at each step.

Attributes#

preprocess_: object: Model used to preprocess the data.
model_: object: Final model used for clustering.

__init__(name: str | None = None, overwrite_model: bool = False, n_cluster: int | None = None, init: Literal['kmeanspp', 'random'] | list | ndarray = 'kmeanspp', max_iter: int = 300, tol: float = 0.0001, use_kprototype: bool = False, gamma: float = 1.0, preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False, 'na_method': 'drop', 'outliers_threshold': 3.0, 'standardize_min_cat': 0}, print_info: bool = True) → None#: Must be overridden in the child class

Methods

`__init__`([name, overwrite_model, n_cluster, ...])	Must be overridden in the child class
`contour`([nbins, chart])	Draws the model's contour plot.
`deploySQL`([X])	Returns the SQL code needed to deploy the model.
`does_model_exists`(name[, raise_error, ...])	Checks whether the model is stored in the Vertica database.
`drop`()	Drops the model from the Vertica database.
`export_models`(name, path[, kind])	Exports machine learning models.
`fit`(input_relation[, X, return_report])	Trains the model.
`get_attributes`([attr_name])	Returns the model attributes.
`get_match_index`(x, col_list[, str_check])	Returns the matching index.
`get_params`()	Returns the parameters of the model.
`get_plotting_lib`([class_name, chart, ...])	Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.
`get_vertica_attributes`([attr_name])	Returns the model Vertica attributes.
`import_models`(path[, schema, kind])	Imports machine learning models.
`register`(registered_name[, raise_error])	Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.
`set_params`([parameters])	Sets the parameters of the model.
`summarize`()	Summarizes the model.
`to_binary`(path)	Exports the model to the Vertica Binary format.
`to_pmml`(path)	Exports the model to PMML.
`to_python`([return_proba, ...])	Returns the Python function needed for in-memory scoring without using built-in Vertica functions.
`to_sql`([X, return_proba, ...])	Returns the SQL code needed to deploy the model without using built-in Vertica functions.
`to_tf`(path)	Exports the model to the Frozen Graph format (TensorFlow).

Attributes

object_type