randomized_features_search_cv

In [ ]:
randomized_features_search_cv(estimator,
                              input_relation: (str, vDataFrame),
                              X: list,
                              y: str,
                              metric: str = "auto",
                              cv: int = 3,
                              pos_label: (int, float, str) = None,
                              cutoff: float = -1,
                              training_score: bool = True,
                              comb_limit: int = 100,
                              skip_error: bool = True,
                              print_info: bool = True,)

Computes the k-fold grid search of an estimator using different combinations of features. It can be used to find parameters that optimize the model.

Parameters

Name Type Optional Description
estimator
object
Vertica estimator having a fit method and a DB cursor.
input_relation
str / vDataFrame
Relation to use to train the model.
X
list
List of the predictor columns.
y
str
Response Column.
metric
str / list
Metric used to do the model evaluation.
  • auto : logloss for classification & rmse for regression.

For Classification:
  • accuracy : Accuracy
  • auc : Area Under the Curve (ROC)
  • bm : Informedness = tpr + tnr - 1
  • csi : Critical Success Index = tp / (tp + fn + fp)
  • f1 : F1 Score
  • logloss : Log Loss
  • mcc : Matthews Correlation Coefficient
  • mk : Markedness = ppv + npv - 1
  • npv : Negative Predictive Value = tn / (tn + fn)
  • prc_auc : Area Under the Curve (PRC)
  • precision : Precision = tp / (tp + fp)
  • recall : Recall = tp / (tp + fn)
  • specificity : Specificity = tn / (tn + fp)

For Regression:
  • max : Max Error
  • mae : Mean Absolute Error
  • median : Median Absolute Error
  • mse : Mean Squared Error
  • msle : Mean Squared Log Error
  • r2 : R-squared coefficient
  • r2a : R2 adjusted
  • rmse : Root Mean Squared Error
  • var : Explained Variance
cv
int
Number of folds.
pos_label
int / float / str
The main class to be considered as positive (classification only).
cutoff
float
The model cutoff (classification only).
training_score
bool
If set to True, the training score will be computed with the validation score.
comb_limit
int
Maximum number of features combinations used to train the model.
skip_error
bool
If set to True and an error occurs, it will be displayed and not raised.
print_info
bool
If set to True, prints the model information at each step.

Returns

tablesample : An object containing the result. For more information, see utilities.tablesample.

Example

In [61]:
from verticapy.learn.linear_model import LogisticRegression
model = LogisticRegression(name = "public.LR_titanic",
                           tol = 1e-4,
                           max_iter = 100, 
                           solver = 'Newton')

from verticapy.learn.model_selection import randomized_features_search_cv
randomized_features_search_cv(model,
                              input_relation = "public.titanic", 
                              X = ["age", "fare", "pclass"], 
                              y = "survived", 
                              cv = 3)
Model: LogisticRegression; Features: ['age']; Test_score: 0.29739591302201435; Train_score: 0.2870846295987417; Time: 0.23079792658487955;
Model: LogisticRegression; Features: ['fare']; Test_score: 0.2622687023390637; Train_score: 0.27165475232364034; Time: 0.25974663098653156;
Model: LogisticRegression; Features: ['pclass']; Test_score: 0.26401495708324; Train_score: 0.25868874967928734; Time: 0.22532439231872559;
Model: LogisticRegression; Features: ['age', 'fare']; Test_score: 0.27261136467610336; Train_score: 0.271655326046594; Time: 0.22174525260925293;
Model: LogisticRegression; Features: ['age', 'pclass']; Test_score: 0.26075270983439364; Train_score: 0.2546633029212083; Time: 0.21323863665262857;
Model: LogisticRegression; Features: ['fare', 'pclass']; Test_score: 0.25272124780890964; Train_score: 0.262128818022629; Time: 0.21834190686543783;
Model: LogisticRegression; Features: ['age', 'fare', 'pclass']; Test_score: 0.2663715923369417; Train_score: 0.25077000202031735; Time: 0.2347706158955892;

Randomized Features Search Selected Model
LogisticRegression; Features: ['fare', 'pclass']; Test_score: 0.25272124780890964; Train_score: 0.262128818022629; Time: 0.21834190686543783;
Out[61]:
features
avg_score
avg_train_score
avg_time
score_std
score_train_std
1['fare', 'pclass']0.252721247808909640.2621288180226290.218341906865437837.043012041026667e-050.00017019024915138368
2['age', 'pclass']0.260752709834393640.25466330292120830.213238636652628570.0020769051645504260.0006827360831617877
3['fare']0.26226870233906370.271654752323640340.259746630986531560.00378319078676199330.0020373834096355167
4['pclass']0.264014957083240.258688749679287340.225324392318725590.0040872097725255850.0022203907434587234
5['age', 'fare', 'pclass']0.26637159233694170.250770002020317350.23477061589558920.0084404204317713190.004107429392604143
6['age', 'fare']0.272611364676103360.2716553260465940.221745252609252930.0064342432915961360.0030117916923140657
7['age']0.297395913022014350.28708462959874170.230797926584879550.00333439804474303970.0017523334554747426
Rows: 1-7 | Columns: 6