Loading...

verticapy.machine_learning.vertica.automl.AutoClustering#

class verticapy.machine_learning.vertica.automl.AutoClustering(name: str | None = None, overwrite_model: bool = False, n_cluster: int | None = None, init: Literal['kmeanspp', 'random'] | list | ndarray = 'kmeanspp', max_iter: int = 300, tol: float = 0.0001, use_kprototype: bool = False, gamma: float = 1.0, preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False, 'na_method': 'drop', 'outliers_threshold': 3.0, 'standardize_min_cat': 0}, print_info: bool = True)#

Automatically creates k different groups with which to generalize the data.

Parameters#

name: str, optional

Name of the model.

overwrite_model: bool, optional

If set to True, training a model with the same name as an existing model overwrites the existing model.

n_cluster: int, optional

Number of clusters. If empty, an optimal number of clusters are determined using multiple k-means models.

init: str | list, optional
The method for finding the initial cluster centers.
kmeansppUses the k-means++ method to

initialize the centers. [Only available when use_kprototype

is set to False]

randomRandomly subsamples the data to find

initial centers.

Alternatively, you can specify a list with the initial cluster centers.

max_iter: int, optional

The maximum number of iterations for the algorithm.

tol: float, optional

Determines whether the algorithm has converged. The algorithm is considered converged after no center has moved more than a distance of ‘tol’ from the previous iteration.

use_kprototype: bool, optional

If set to True, the function uses the k-prototypes algorithm instead of k-means. k-prototypes can handle categorical features.

gamma: float, optional

[Only if use_kprototype is set to True] Weighting factor for categorical columns. It determines the relative importance of numerical and categorical attributes.

preprocess_data: bool, optional

If True, the data will be preprocessed.

preprocess_dict: dict, optional

Dictionary to pass to the AutoDataPrep class in order to preprocess the data before clustering.

print_info: bool

If True, prints the model information at each step.

Attributes#

preprocess_: object

Model used to preprocess the data.

model_: object

Final model used for clustering.

__init__(name: str | None = None, overwrite_model: bool = False, n_cluster: int | None = None, init: Literal['kmeanspp', 'random'] | list | ndarray = 'kmeanspp', max_iter: int = 300, tol: float = 0.0001, use_kprototype: bool = False, gamma: float = 1.0, preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False, 'na_method': 'drop', 'outliers_threshold': 3.0, 'standardize_min_cat': 0}, print_info: bool = True) None#

Must be overridden in the child class

Methods

__init__([name, overwrite_model, n_cluster, ...])

Must be overridden in the child class

contour([nbins, chart])

Draws the model's contour plot.

deploySQL([X])

Returns the SQL code needed to deploy the model.

does_model_exists(name[, raise_error, ...])

Checks whether the model is stored in the Vertica database.

drop()

Drops the model from the Vertica database.

export_models(name, path[, kind])

Exports machine learning models.

fit(input_relation[, X, return_report])

Trains the model.

get_attributes([attr_name])

Returns the model attributes.

get_match_index(x, col_list[, str_check])

Returns the matching index.

get_params()

Returns the parameters of the model.

get_plotting_lib([class_name, chart, ...])

Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.

get_vertica_attributes([attr_name])

Returns the model Vertica attributes.

import_models(path[, schema, kind])

Imports machine learning models.

register(registered_name[, raise_error])

Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.

set_params([parameters])

Sets the parameters of the model.

summarize()

Summarizes the model.

to_binary(path)

Exports the model to the Vertica Binary format.

to_pmml(path)

Exports the model to PMML.

to_python([return_proba, ...])

Returns the Python function needed for in-memory scoring without using built-in Vertica functions.

to_sql([X, return_proba, ...])

Returns the SQL code needed to deploy the model without using built-in Vertica functions.

to_tf(path)

Exports the model to the Frozen Graph format (TensorFlow).

Attributes

object_type