verticapy.machine_learning.vertica.automl.AutoML#
- class verticapy.machine_learning.vertica.automl.AutoML(name: str | None = None, overwrite_model: bool = False, estimator: list | str = 'fast', estimator_type: Literal['auto', 'regressor', 'binary', 'multi'] = 'auto', metric: str = 'auto', cv: int = 3, pos_label: bool | float | str | timedelta | datetime | None = None, cutoff: float = -1, nbins: int = 100, lmax: int = 5, optimized_grid: int = 2, stepwise: bool = True, stepwise_criterion: Literal['aic', 'bic'] = 'aic', stepwise_direction: Literal['forward', 'backward'] = 'backward', stepwise_max_steps: int = 100, stepwise_x_order: Literal['pearson', 'spearman', 'random', 'none'] = 'pearson', preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False}, print_info: bool = True)#
Tests multiple models to find those that maximize the input score.
Parameters#
- name: str, optional
Name of the model.
- overwrite_model: bool, optional
If set to
True
, training a model with the same name as an existing model overwrites the existing model.- estimator: list / ‘native’ / ‘all’ / ‘fast’ / object
List of Vertica estimators with a fit method. Alternatively, you can specify ‘native’ for all native Vertica models, ‘all’ for all VerticaPy models, and ‘fast’ for quick modeling.
- estimator_type: str, optional
- Estimator Type.
- autoAutomatically detects the
estimator type.
- regressorThe estimator is used to
perform a regression.
- binaryThe estimator is used to
perform a binary classification.
- multiThe estimator is used to
perform a multiclass classification.
- metric: str, optional
- Metric used for the model evaluation.
- auto: logloss for classification & RMSE for
regression.
- For Classification:
accuracy : Accuracy auc : Area Under the Curve
(ROC)
- baBalanced Accuracy
= (tpr + tnr) / 2
- bmInformedness
= tpr + tnr - 1
- csiCritical Success Index
= tp / (tp + fn + fp)
f1 : F1 Score fdr : False Discovery Rate = 1 - ppv fm : Fowlkes–Mallows index
= sqrt(ppv * tpr)
- fnrFalse Negative Rate
= fn / (fn + tp)
for : False Omission Rate = 1 - npv fpr : False Positive Rate
= fp / (fp + tn)
logloss : Log Loss lr+ : Positive Likelihood Ratio
= tpr / fpr
- lr-Negative Likelihood Ratio
= fnr / tnr
dor : Diagnostic Odds Ratio mcc : Matthews Correlation Coefficient mk : Markedness
= ppv + npv - 1
- npvNegative Predictive Value
= tn / (tn + fn)
- prc_aucArea Under the Curve
(PRC)
- precisionPrecision
= tp / (tp + fp)
- ptPrevalence Threshold
= sqrt(fpr) / (sqrt(tpr) + sqrt(fpr))
- recallRecall
= tp / (tp + fn)
- specificitySpecificity
= tn / (tn + fp)
- For Regression:
max : Max error mae : Mean absolute error median : Median absolute error mse : Mean squared error msle : Mean squared log error r2 : R-squared coefficient r2a : R2 adjusted rmse : Root-mean-squared error var : Explained variance
- cv: int, optional
Number of folds.
- pos_label: PythonScalar, optional
The main class to be considered as positive (classification only).
- cutoff: float, optional
The model cutoff (classification only).
- nbins: int, optional
Number of bins used to compute the different parameter categories.
- lmax: int, optional
Maximum length of each parameter list.
- optimized_grid: int, optional
If set to zero, the randomness is based on the input parameters. If set to one, the randomness is limited to some parameters while others are picked based on a default grid. If set to two, no randomness is used and a default grid is returned.
- stepwise: bool, optional
If True, the stepwise algorithm is used to determine the final model list of parameters.
- stepwise_criterion: str, optional
Criterion used when performing the final estimator stepwise.
aic : Akaike’s information criterion bic : Bayesian information criterion
- stepwise_direction: str, optional
Direction to start the stepwise search, either ‘backward’ or ‘forward’.
- stepwise_max_steps: int, optional
The maximum number of steps to be considered when performing the final estimator stepwise.
- x_order: str, optional
Method for preprocessing X before using the stepwise algorithm.
- pearsonX is ordered based on the
Pearson’s correlation coefficient.
- spearmanX is ordered based on
Spearman’s rank correlation coefficient.
- randomShuffles the vector X before
applying the stepwise algorithm.
- noneDoes not change the order of
- preprocess_data: bool, optional
If True, the data will be preprocessed.
- preprocess_dict: dict, optional
Dictionary to pass to the AutoDataPrep class in order to preprocess the data before clustering.
- print_info: bool
If True, prints the model information at each step.
Attributes#
- preprocess_: object
Model used to preprocess the data.
- best_model_: object
Most efficient models found during the search.
- model_grid_TableSample
Grid containing the different models information.
- __init__(name: str | None = None, overwrite_model: bool = False, estimator: list | str = 'fast', estimator_type: Literal['auto', 'regressor', 'binary', 'multi'] = 'auto', metric: str = 'auto', cv: int = 3, pos_label: bool | float | str | timedelta | datetime | None = None, cutoff: float = -1, nbins: int = 100, lmax: int = 5, optimized_grid: int = 2, stepwise: bool = True, stepwise_criterion: Literal['aic', 'bic'] = 'aic', stepwise_direction: Literal['forward', 'backward'] = 'backward', stepwise_max_steps: int = 100, stepwise_x_order: Literal['pearson', 'spearman', 'random', 'none'] = 'pearson', preprocess_data: bool = True, preprocess_dict: dict = {'identify_ts': False}, print_info: bool = True) None #
Must be overridden in the child class
Methods
__init__
([name, overwrite_model, estimator, ...])Must be overridden in the child class
contour
([nbins, chart])Draws the model's contour plot.
deploySQL
([X])Returns the SQL code needed to deploy the model.
does_model_exists
(name[, raise_error, ...])Checks whether the model is stored in the Vertica database.
drop
()Drops the model from the Vertica database.
export_models
(name, path[, kind])Exports machine learning models.
features_importance
([chart])Computes the model's features importance.
fit
(input_relation[, X, y, return_report])Trains the model.
get_attributes
([attr_name])Returns the model attributes.
get_match_index
(x, col_list[, str_check])Returns the matching index.
Returns the parameters of the model.
get_plotting_lib
([class_name, chart, ...])Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.
get_vertica_attributes
([attr_name])Returns the model attribute.
import_models
(path[, schema, kind])Imports machine learning models.
plot
([mltype, chart])Draws the AutoML plot.
register
(registered_name[, raise_error])Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.
set_params
([parameters])Sets the parameters of the model.
Summarizes the model.
to_binary
(path)Exports the model to the Vertica Binary format.
Converts the model to an InMemory object that can be used for different types of predictions.
to_pmml
(path)Exports the model to PMML.
to_python
([return_proba, ...])Returns the Python function needed for in-memory scoring without using built-in Vertica functions.
to_sql
([X, return_proba, ...])Returns the SQL code needed to deploy the model without using built-in Vertica functions.
to_tf
(path)Exports the model to the Frozen Graph format (TensorFlow).
Attributes