verticapy.machine_learning.vertica.automl.AutoClustering#
- class verticapy.machine_learning.vertica.automl.AutoClustering(name: str | None = None, overwrite_model: bool = False, n_cluster: int | None = None, init: Literal['kmeanspp', 'random'] | list | ndarray = 'kmeanspp', max_iter: int = 300, tol: float = 0.0001, use_kprototype: bool = False, gamma: float = 1.0, preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False, 'na_method': 'drop', 'outliers_threshold': 3.0, 'standardize_min_cat': 0}, print_info: bool = True)#
Automatically creates k different groups with which to generalize the data.
Parameters#
- name: str, optional
Name of the model.
- overwrite_model: bool, optional
If set to True, training a model with the same name as an existing model overwrites the existing model.
- n_cluster: int, optional
Number of clusters. If empty, an optimal number of clusters are determined using multiple k-means models.
- init: str | list, optional
- The method for finding the initial cluster centers.
- kmeansppUses the k-means++ method to
initialize the centers. [Only available when use_kprototype
is set to False]
- randomRandomly subsamples the data to find
initial centers.
Alternatively, you can specify a list with the initial cluster centers.
- max_iter: int, optional
The maximum number of iterations for the algorithm.
- tol: float, optional
Determines whether the algorithm has converged. The algorithm is considered converged after no center has moved more than a distance of ‘tol’ from the previous iteration.
- use_kprototype: bool, optional
If set to True, the function uses the k-prototypes algorithm instead of k-means. k-prototypes can handle categorical features.
- gamma: float, optional
[Only if use_kprototype is set to True] Weighting factor for categorical columns. It determines the relative importance of numerical and categorical attributes.
- preprocess_data: bool, optional
If True, the data will be preprocessed.
- preprocess_dict: dict, optional
Dictionary to pass to the AutoDataPrep class in order to preprocess the data before clustering.
- print_info: bool
If True, prints the model information at each step.
Attributes#
- preprocess_: object
Model used to preprocess the data.
- model_: object
Final model used for clustering.
- __init__(name: str | None = None, overwrite_model: bool = False, n_cluster: int | None = None, init: Literal['kmeanspp', 'random'] | list | ndarray = 'kmeanspp', max_iter: int = 300, tol: float = 0.0001, use_kprototype: bool = False, gamma: float = 1.0, preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False, 'na_method': 'drop', 'outliers_threshold': 3.0, 'standardize_min_cat': 0}, print_info: bool = True) None #
Must be overridden in the child class
Methods
__init__
([name, overwrite_model, n_cluster, ...])Must be overridden in the child class
contour
([nbins, chart])Draws the model's contour plot.
deploySQL
([X])Returns the SQL code needed to deploy the model.
does_model_exists
(name[, raise_error, ...])Checks whether the model is stored in the Vertica database.
drop
()Drops the model from the Vertica database.
export_models
(name, path[, kind])Exports machine learning models.
fit
(input_relation[, X, return_report])Trains the model.
get_attributes
([attr_name])Returns the model attributes.
get_match_index
(x, col_list[, str_check])Returns the matching index.
Returns the parameters of the model.
get_plotting_lib
([class_name, chart, ...])Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.
get_vertica_attributes
([attr_name])Returns the model Vertica attributes.
import_models
(path[, schema, kind])Imports machine learning models.
register
(registered_name[, raise_error])Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.
set_params
([parameters])Sets the parameters of the model.
Summarizes the model.
to_binary
(path)Exports the model to the Vertica Binary format.
to_pmml
(path)Exports the model to PMML.
to_python
([return_proba, ...])Returns the Python function needed for in-memory scoring without using built-in Vertica functions.
to_sql
([X, return_proba, ...])Returns the SQL code needed to deploy the model without using built-in Vertica functions.
to_tf
(path)Exports the model to the Frozen Graph format (TensorFlow).
Attributes
object_type