Loading...

verticapy.machine_learning.vertica.automl.AutoML#

class verticapy.machine_learning.vertica.automl.AutoML(name: str | None = None, overwrite_model: bool = False, estimator: list | str = 'fast', estimator_type: Literal['auto', 'regressor', 'binary', 'multi'] = 'auto', metric: str = 'auto', cv: int = 3, pos_label: bool | float | str | timedelta | datetime | None = None, cutoff: float = -1, nbins: int = 100, lmax: int = 5, optimized_grid: int = 2, stepwise: bool = True, stepwise_criterion: Literal['aic', 'bic'] = 'aic', stepwise_direction: Literal['forward', 'backward'] = 'backward', stepwise_max_steps: int = 100, stepwise_x_order: Literal['pearson', 'spearman', 'random', 'none'] = 'pearson', preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False}, print_info: bool = True)#

Tests multiple models to find those that maximize the input score.

Parameters#

name: str, optional

Name of the model.

overwrite_model: bool, optional

If set to True, training a model with the same name as an existing model overwrites the existing model.

estimator: list / ‘native’ / ‘all’ / ‘fast’ / object

List of Vertica estimators with a fit method. Alternatively, you can specify ‘native’ for all native Vertica models, ‘all’ for all VerticaPy models, and ‘fast’ for quick modeling.

estimator_type: str, optional
Estimator Type.
autoAutomatically detects the

estimator type.

regressorThe estimator is used to

perform a regression.

binaryThe estimator is used to

perform a binary classification.

multiThe estimator is used to

perform a multiclass classification.

metric: str, optional
Metric used for the model evaluation.
auto: logloss for classification & RMSE for

regression.

For Classification:

accuracy : Accuracy auc : Area Under the Curve

(ROC)

baBalanced Accuracy

= (tpr + tnr) / 2

bmInformedness

= tpr + tnr - 1

csiCritical Success Index

= tp / (tp + fn + fp)

f1 : F1 Score fdr : False Discovery Rate = 1 - ppv fm : Fowlkes–Mallows index

= sqrt(ppv * tpr)

fnrFalse Negative Rate

= fn / (fn + tp)

for : False Omission Rate = 1 - npv fpr : False Positive Rate

= fp / (fp + tn)

logloss : Log Loss lr+ : Positive Likelihood Ratio

= tpr / fpr

lr-Negative Likelihood Ratio

= fnr / tnr

dor : Diagnostic Odds Ratio mcc : Matthews Correlation Coefficient mk : Markedness

= ppv + npv - 1

npvNegative Predictive Value

= tn / (tn + fn)

prc_aucArea Under the Curve

(PRC)

precisionPrecision

= tp / (tp + fp)

ptPrevalence Threshold

= sqrt(fpr) / (sqrt(tpr) + sqrt(fpr))

recallRecall

= tp / (tp + fn)

specificitySpecificity

= tn / (tn + fp)

For Regression:

max : Max error mae : Mean absolute error median : Median absolute error mse : Mean squared error msle : Mean squared log error r2 : R-squared coefficient r2a : R2 adjusted rmse : Root-mean-squared error var : Explained variance

cv: int, optional

Number of folds.

pos_label: PythonScalar, optional

The main class to be considered as positive (classification only).

cutoff: float, optional

The model cutoff (classification only).

nbins: int, optional

Number of bins used to compute the different parameter categories.

lmax: int, optional

Maximum length of each parameter list.

optimized_grid: int, optional

If set to zero, the randomness is based on the input parameters. If set to one, the randomness is limited to some parameters while others are picked based on a default grid. If set to two, no randomness is used and a default grid is returned.

stepwise: bool, optional

If True, the stepwise algorithm is used to determine the final model list of parameters.

stepwise_criterion: str, optional

Criterion used when performing the final estimator stepwise.

aic : Akaike’s information criterion bic : Bayesian information criterion

stepwise_direction: str, optional

Direction to start the stepwise search, either ‘backward’ or ‘forward’.

stepwise_max_steps: int, optional

The maximum number of steps to be considered when performing the final estimator stepwise.

x_order: str, optional

Method for preprocessing X before using the stepwise algorithm.

pearsonX is ordered based on the

Pearson’s correlation coefficient.

spearmanX is ordered based on

Spearman’s rank correlation coefficient.

randomShuffles the vector X before

applying the stepwise algorithm.

noneDoes not change the order of
preprocess_data: bool, optional

If True, the data will be preprocessed.

preprocess_dict: dict, optional

Dictionary to pass to the AutoDataPrep class in order to preprocess the data before clustering.

print_info: bool

If True, prints the model information at each step.

Attributes#

preprocess_: object

Model used to preprocess the data.

best_model_: object

Most efficient models found during the search.

model_grid_TableSample

Grid containing the different models information.

__init__(name: str | None = None, overwrite_model: bool = False, estimator: list | str = 'fast', estimator_type: Literal['auto', 'regressor', 'binary', 'multi'] = 'auto', metric: str = 'auto', cv: int = 3, pos_label: bool | float | str | timedelta | datetime | None = None, cutoff: float = -1, nbins: int = 100, lmax: int = 5, optimized_grid: int = 2, stepwise: bool = True, stepwise_criterion: Literal['aic', 'bic'] = 'aic', stepwise_direction: Literal['forward', 'backward'] = 'backward', stepwise_max_steps: int = 100, stepwise_x_order: Literal['pearson', 'spearman', 'random', 'none'] = 'pearson', preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False}, print_info: bool = True) None#

Must be overridden in the child class

Methods

__init__([name, overwrite_model, estimator, ...])

Must be overridden in the child class

contour([nbins, chart])

Draws the model's contour plot.

deploySQL([X])

Returns the SQL code needed to deploy the model.

does_model_exists(name[, raise_error, ...])

Checks whether the model is stored in the Vertica database.

drop()

Drops the model from the Vertica database.

export_models(name, path[, kind])

Exports machine learning models.

features_importance([chart])

Computes the model's features importance.

fit(input_relation[, X, y, return_report])

Trains the model.

get_attributes([attr_name])

Returns the model attributes.

get_match_index(x, col_list[, str_check])

Returns the matching index.

get_params()

Returns the parameters of the model.

get_plotting_lib([class_name, chart, ...])

Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.

get_vertica_attributes([attr_name])

Returns the model attribute.

import_models(path[, schema, kind])

Imports machine learning models.

plot([mltype, chart])

Draws the AutoML plot.

register(registered_name[, raise_error])

Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.

set_params([parameters])

Sets the parameters of the model.

summarize()

Summarizes the model.

to_binary(path)

Exports the model to the Vertica Binary format.

to_memmodel()

Converts the model to an InMemory object that can be used for different types of predictions.

to_pmml(path)

Exports the model to PMML.

to_python([return_proba, ...])

Returns the Python function needed for in-memory scoring without using built-in Vertica functions.

to_sql([X, return_proba, ...])

Returns the SQL code needed to deploy the model without using built-in Vertica functions.

to_tf(path)

Exports the model to the Frozen Graph format (TensorFlow).

Attributes