verticapy.machine_learning.vertica.automl.AutoML#

class verticapy.machine_learning.vertica.automl.AutoML(name: str | None = None, overwrite_model: bool = False, estimator: list | str = 'fast', estimator_type: Literal['auto', 'regressor', 'binary', 'multi'] = 'auto', metric: str = 'auto', cv: int = 3, pos_label: bool | float | str | timedelta | datetime | None = None, cutoff: float = -1, nbins: int = 100, lmax: int = 5, optimized_grid: int = 2, stepwise: bool = True, stepwise_criterion: Literal['aic', 'bic'] = 'aic', stepwise_direction: Literal['forward', 'backward'] = 'backward', stepwise_max_steps: int = 100, stepwise_x_order: Literal['pearson', 'spearman', 'random', 'none'] = 'pearson', preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False}, print_info: bool = True)#

Tests multiple models to find those that maximize the input score.

Parameters#

name: str, optional

Name of the model.

overwrite_model: bool, optional

If set to True, training a model with the same name as an existing model overwrites the existing model.

estimator: list / ‘native’ / ‘all’ / ‘fast’ / object

List of Vertica estimators with a fit method. Alternatively, you can specify ‘native’ for all native Vertica models, ‘all’ for all VerticaPy models, and ‘fast’ for quick modeling.

estimator_type: str, optional

Estimator Type.

autoAutomatically detects the: estimator type.
regressorThe estimator is used to: perform a regression.
binaryThe estimator is used to: perform a binary classification.
multiThe estimator is used to: perform a multiclass classification.

metric: str, optional

Metric used for the model evaluation.

auto: logloss for classification & RMSE for: regression.

For Classification:

accuracy : Accuracy auc : Area Under the Curve

(ROC)

baBalanced Accuracy: = (tpr + tnr) / 2
bmInformedness: = tpr + tnr - 1
csiCritical Success Index: = tp / (tp + fn + fp)

f1 : F1 Score fdr : False Discovery Rate = 1 - ppv fm : Fowlkes–Mallows index

= sqrt(ppv * tpr)

fnrFalse Negative Rate: = fn / (fn + tp)

for : False Omission Rate = 1 - npv fpr : False Positive Rate

= fp / (fp + tn)

logloss : Log Loss lr+ : Positive Likelihood Ratio

= tpr / fpr

lr-Negative Likelihood Ratio: = fnr / tnr

dor : Diagnostic Odds Ratio mcc : Matthews Correlation Coefficient mk : Markedness

= ppv + npv - 1

npvNegative Predictive Value: = tn / (tn + fn)
prc_aucArea Under the Curve: (PRC)
precisionPrecision: = tp / (tp + fp)
ptPrevalence Threshold: = sqrt(fpr) / (sqrt(tpr) + sqrt(fpr))
recallRecall: = tp / (tp + fn)
specificitySpecificity: = tn / (tn + fp)

For Regression:

max : Max error mae : Mean absolute error median : Median absolute error mse : Mean squared error msle : Mean squared log error r2 : R-squared coefficient r2a : R2 adjusted rmse : Root-mean-squared error var : Explained variance

cv: int, optional

Number of folds.

pos_label: PythonScalar, optional

The main class to be considered as positive (classification only).

cutoff: float, optional

The model cutoff (classification only).

nbins: int, optional

Number of bins used to compute the different parameter categories.

lmax: int, optional

Maximum length of each parameter list.

optimized_grid: int, optional

If set to zero, the randomness is based on the input parameters. If set to one, the randomness is limited to some parameters while others are picked based on a default grid. If set to two, no randomness is used and a default grid is returned.

stepwise: bool, optional

If True, the stepwise algorithm is used to determine the final model list of parameters.

stepwise_criterion: str, optional

Criterion used when performing the final estimator stepwise.

aic : Akaike’s information criterion bic : Bayesian information criterion

stepwise_direction: str, optional

Direction to start the stepwise search, either ‘backward’ or ‘forward’.

stepwise_max_steps: int, optional

The maximum number of steps to be considered when performing the final estimator stepwise.

x_order: str, optional

Method for preprocessing X before using the stepwise algorithm.

pearsonX is ordered based on the
Pearson’s correlation coefficient.

spearmanX is ordered based on
Spearman’s rank correlation coefficient.

randomShuffles the vector X before
applying the stepwise algorithm.

noneDoes not change the order of

preprocess_data: bool, optional

If True, the data will be preprocessed.

preprocess_dict: dict, optional

Dictionary to pass to the AutoDataPrep class in order to preprocess the data before clustering.

print_info: bool

If True, prints the model information at each step.

Attributes#

preprocess_: object: Model used to preprocess the data.
best_model_: object: Most efficient models found during the search.
model_grid_TableSample: Grid containing the different models information.

__init__(name: str | None = None, overwrite_model: bool = False, estimator: list | str = 'fast', estimator_type: Literal['auto', 'regressor', 'binary', 'multi'] = 'auto', metric: str = 'auto', cv: int = 3, pos_label: bool | float | str | timedelta | datetime | None = None, cutoff: float = -1, nbins: int = 100, lmax: int = 5, optimized_grid: int = 2, stepwise: bool = True, stepwise_criterion: Literal['aic', 'bic'] = 'aic', stepwise_direction: Literal['forward', 'backward'] = 'backward', stepwise_max_steps: int = 100, stepwise_x_order: Literal['pearson', 'spearman', 'random', 'none'] = 'pearson', preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False}, print_info: bool = True) → None#: Must be overridden in the child class

Methods

`__init__`([name, overwrite_model, estimator, ...])	Must be overridden in the child class
`contour`([nbins, chart])	Draws the model's contour plot.
`deploySQL`([X])	Returns the SQL code needed to deploy the model.
`does_model_exists`(name[, raise_error, ...])	Checks whether the model is stored in the Vertica database.
`drop`()	Drops the model from the Vertica database.
`export_models`(name, path[, kind])	Exports machine learning models.
`features_importance`([chart])	Computes the model's features importance.
`fit`(input_relation[, X, y, return_report])	Trains the model.
`get_attributes`([attr_name])	Returns the model attributes.
`get_match_index`(x, col_list[, str_check])	Returns the matching index.
`get_params`()	Returns the parameters of the model.
`get_plotting_lib`([class_name, chart, ...])	Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.
`get_vertica_attributes`([attr_name])	Returns the model attribute.
`import_models`(path[, schema, kind])	Imports machine learning models.
`plot`([mltype, chart])	Draws the AutoML plot.
`register`(registered_name[, raise_error])	Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.
`set_params`([parameters])	Sets the parameters of the model.
`summarize`()	Summarizes the model.
`to_binary`(path)	Exports the model to the Vertica Binary format.
`to_memmodel`()	Converts the model to an InMemory object that can be used for different types of predictions.
`to_pmml`(path)	Exports the model to PMML.
`to_python`([return_proba, ...])	Returns the Python function needed for in-memory scoring without using built-in Vertica functions.
`to_sql`([X, return_proba, ...])	Returns the SQL code needed to deploy the model without using built-in Vertica functions.
`to_tf`(path)	Exports the model to the Frozen Graph format (TensorFlow).

Attributes

object_type