Loading...

verticapy.machine_learning.model_selection.hp_tuning.randomized_search_cv#

verticapy.machine_learning.model_selection.hp_tuning.randomized_search_cv(estimator: VerticaModel, input_relation: str | vDataFrame, X: str | list[str], y: str, metric: str = 'auto', cv: int = 3, average: Literal['binary', 'micro', 'macro', 'weighted'] = 'weighted', pos_label: bool | float | str | timedelta | datetime | None = None, cutoff: float = -1, nbins: int = 1000, lmax: int = 4, optimized_grid: int = 1, print_info: bool = True) TableSample#

Computes the K-Fold randomized search of an estimator.

estimator: VerticaModel

Vertica estimator with a fit method.

input_relation: SQLRelation

Relation used to train the model.

X: SQLColumns

list of the predictor columns.

y: str

Response Column.

metric: str, optional

Metric used for the model evaluation.

  • auto:

    logloss for classification & RMSE for regression.

For Classification

  • accuracy:

    Accuracy.

    \[Accuracy = \frac{TP + TN}{TP + TN + FP + FN}\]
  • auc:

    Area Under the Curve (ROC).

    \[AUC = \int_{0}^{1} TPR(FPR) \, dFPR\]
  • ba:

    Balanced Accuracy.

    \[BA = \frac{TPR + TNR}{2}\]
  • bm:

    Informedness

    \[BM = TPR + TNR - 1\]
  • csi:

    Critical Success Index

    \[index = \frac{TP}{TP + FN + FP}\]
  • f1:

    F1 Score .. math:

    F_1 Score = 2 \times 
    

rac{Precision times Recall}{Precision + Recall}

  • fdr:

    False Discovery Rate

    \[FDR = 1 - PPV\]
  • fm:

    Fowlkes-Mallows index

    \[FM = \sqrt{PPV * TPR}\]
  • fnr:

    False Negative Rate

    \[FNR = \frac{FN}{FN + TP}\]
  • for:

    False Omission Rate

    \[FOR = 1 - NPV\]
  • fpr:

    False Positive Rate

    \[FPR = \frac{FP}{FP + TN}\]
  • logloss:

    Log Loss

    \[Loss = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right)\]
  • lr+:

    Positive Likelihood Ratio.

    \[LR+ = \frac{TPR}{FPR}\]
  • lr-:

    Negative Likelihood Ratio.

    \[LR- = \frac{FNR}{TNR}\]
  • dor:

    Diagnostic Odds Ratio.

    \[DOR = \frac{TP \times TN}{FP \times FN}\]
  • mcc:

    Matthews Correlation Coefficient

  • mk:

    Markedness

    \[MK = PPV + NPV - 1\]
  • npv:

    Negative Predictive Value

    \[NPV = \frac{TN}{TN + FN}\]
  • prc_auc:

    Area Under the Curve (PRC)

    \[AUC = \int_{0}^{1} Precision(Recall) \, dRecall\]
  • precision:

    Precision

    \[TP / (TP + FP)\]
  • pt:

    Prevalence Threshold.

    \[\frac{\sqrt{FPR}}{\sqrt{TPR} + \sqrt{FPR}}\]
  • recall:

    Recall.

    \[TP / (TP + FN)\]
  • specificity:

    Specificity.

    \[TN / (TN + FP)\]

For Regression

  • max:

    Max Error.

    \[ME = \max_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
  • mae:

    Mean Absolute Error.

    \[MAE = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
  • median:

    Median Absolute Error.

    \[MedAE = \text{median}_{i=1}^{n} \left| y_i - \hat{y}_i \right|\]
  • mse:

    Mean Squared Error.

    \[MSE = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2\]
  • msle:

    Mean Squared Log Error.

    \[MSLE = \frac{1}{n} \sum_{i=1}^{n} (\log(1 + y_i) - \log(1 + \hat{y}_i))^2\]
  • r2:

    R squared coefficient.

    \[R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}\]
  • r2a:

    R2 adjusted

    \[\text{Adjusted } R^2 = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1}\]
  • var:

    Explained Variance.

    \[VAR = 1 - \frac{Var(y - \hat{y})}{Var(y)}\]
  • rmse:

    Root-mean-squared error

    \[RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}\]
cv: int, optional

Number of folds.

average: str, optional

The method used to compute the final score for multiclass -classification.

  • binary:

    considers one of the classes as positive and use the binary confusion matrix to compute the score.

  • micro:

    positive and negative values globally.

  • macro:

    average of the score of each class.

  • weighted:

    weighted average of the score of each class.

pos_label: PythonScalar, optional

The main class to be considered as positive (classification only).

cutoff: float, optional

The model cutoff (classification only).

nbins: int, optional

Number of bins used to compute the different parameters categories.

lmax: int, optional

Maximum length of each parameter list.

optimized_grid: int, optional

If set to 0, the randomness is based on the input parameters. If set to 1, the randomness is limited to some parameters while others are picked based on a default grid. If set to 2, there is no randomness and a default grid is returned.

print_info: bool, optional

If set to True, prints the model information at each step.

TableSample

result of the randomized search.

We import verticapy:

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

For this example, we will use the Wine Quality dataset.

import verticapy.datasets as vpd

data = vpd.load_winequality()
123
fixed_acidity
Numeric(8)
123
volatile_acidity
Numeric(9)
123
citric_acid
Numeric(8)
123
residual_sugar
Numeric(9)
123
chlorides
Float(22)
123
free_sulfur_dioxide
Numeric(9)
123
total_sulfur_dioxide
Numeric(9)
123
density
Float(22)
123
pH
Numeric(8)
123
sulphates
Numeric(8)
123
alcohol
Float(22)
123
quality
Integer
123
good
Integer
Abc
color
Varchar(20)
13.80.310.0211.10.03620.0114.00.992483.750.4412.460white
23.90.2250.44.20.0329.0118.00.9893.570.3612.881white
34.20.170.361.80.02993.0161.00.989993.650.8912.071white
44.20.2150.235.10.04164.0157.00.996883.420.448.030white
54.40.320.394.30.0331.0127.00.989043.460.3612.881white
64.40.460.12.80.02431.0111.00.988163.480.3413.160white
74.40.540.095.10.03852.097.00.990223.410.412.271white
84.50.190.210.950.03389.0159.00.993323.340.428.050white
94.60.4450.01.40.05311.0178.00.994263.790.5510.250white
104.60.520.152.10.0548.065.00.99343.90.5613.140red
114.70.1450.291.00.04235.090.00.99083.760.4911.360white
124.70.3350.141.30.03669.0168.00.992123.470.4610.550white
134.70.4550.181.90.03633.0106.00.987463.210.8314.071white
144.70.60.172.30.05817.0106.00.99323.850.612.960red
154.70.670.091.00.025.09.00.987223.30.3413.650white
164.70.7850.03.40.03623.0134.00.989813.530.9213.860white
174.80.130.321.20.04240.098.00.98983.420.6411.871white
184.80.170.282.90.0322.0111.00.99023.380.3411.371white
194.80.210.2110.20.03717.0112.00.993243.660.4812.271white
204.80.2250.381.20.07447.0130.00.991323.310.410.360white
214.80.260.2310.60.03423.0111.00.992743.460.2811.571white
224.80.290.231.10.04438.0180.00.989243.280.3411.960white
234.80.330.06.50.02834.0163.00.99373.350.619.950white
244.80.340.06.50.02833.0163.00.99393.360.619.960white
254.80.650.121.10.0134.010.00.992463.320.3613.540white
264.90.2350.2711.750.0334.0118.00.99543.070.59.460white
274.90.330.311.20.01639.0150.00.987133.330.5914.081white
284.90.3350.141.30.03669.0168.00.992123.470.4610.466666666666750white
294.90.3350.141.30.03669.0168.00.992123.470.4610.466666666666750white
304.90.3450.341.00.06832.0143.00.991383.240.410.150white
314.90.3450.341.00.06832.0143.00.991383.240.410.150white
324.90.420.02.10.04816.042.00.991543.710.7414.071red
334.90.470.171.90.03560.0148.00.989643.270.3511.560white
345.00.170.561.50.02624.0115.00.99063.480.3910.871white
355.00.20.41.90.01520.098.00.98973.370.5512.0560white
365.00.2350.2711.750.0334.0118.00.99543.070.59.460white
375.00.240.195.00.04317.0101.00.994383.670.5710.050white
385.00.240.212.20.03931.0100.00.990983.690.6211.760white
395.00.240.341.10.03449.0158.00.987743.320.3213.171white
405.00.2550.222.70.04346.0153.00.992383.750.7611.360white
415.00.270.324.50.03258.0178.00.989563.450.3112.671white
425.00.270.324.50.03258.0178.00.989563.450.3112.671white
435.00.270.41.20.07642.0124.00.992043.320.4710.160white
445.00.290.545.70.03554.0155.00.989763.270.3412.981white
455.00.30.333.70.0354.0173.00.98873.360.313.071white
465.00.310.06.40.04643.0166.00.9943.30.639.960white
475.00.330.161.50.04910.097.00.99173.480.4410.760white
485.00.330.161.50.04910.097.00.99173.480.4410.760white
495.00.330.161.50.04910.097.00.99173.480.4410.760white
505.00.330.184.60.03240.0124.00.991143.180.411.060white
515.00.330.2311.80.0323.0158.00.993223.410.6411.860white
525.00.350.257.80.03124.0116.00.992413.390.411.360white
535.00.350.257.80.03124.0116.00.992413.390.411.360white
545.00.380.011.60.04826.060.00.990843.70.7514.060red
555.00.40.54.30.04629.080.00.99023.490.6613.660red
565.00.420.242.00.0619.050.00.99173.720.7414.081red
575.00.440.0418.60.03938.0128.00.99853.370.5710.260white
585.00.4550.181.90.03633.0106.00.987463.210.8314.071white
595.00.550.148.30.03235.0164.00.99183.530.5112.581white
605.00.610.121.30.00965.0100.00.98743.260.3713.550white
615.00.740.01.20.04116.046.00.992584.010.5912.560red
625.01.020.041.40.04541.085.00.99383.750.4810.540red
635.01.040.241.60.0532.096.00.99343.740.6211.550red
645.10.110.321.60.02812.090.00.990083.570.5212.260white
655.10.140.250.70.03915.089.00.99193.220.439.260white
665.10.1650.225.70.04742.0146.00.99343.180.559.960white
675.10.210.281.40.04748.0148.00.991683.50.4910.450white
685.10.230.181.00.05313.099.00.989563.220.3911.550white
695.10.250.361.30.03540.078.00.98913.230.6412.171white
705.10.260.331.10.02746.0113.00.989463.350.4311.471white
715.10.260.346.40.03426.099.00.994493.230.419.260white
725.10.290.288.30.02627.0107.00.993083.360.3711.060white
735.10.290.288.30.02627.0107.00.993083.360.3711.060white
745.10.30.32.30.04840.0150.00.989443.290.4612.260white
755.10.3050.131.750.03617.073.00.993.40.5112.333333333333350white
765.10.310.30.90.03728.0152.00.9923.540.5610.160white
775.10.330.221.60.02718.089.00.98933.510.3812.571white
785.10.330.221.60.02718.089.00.98933.510.3812.571white
795.10.330.221.60.02718.089.00.98933.510.3812.571white
805.10.330.276.70.02244.0129.00.992213.360.3911.071white
815.10.350.266.80.03436.0120.00.991883.380.411.560white
825.10.350.266.80.03436.0120.00.991883.380.411.560white
835.10.350.266.80.03436.0120.00.991883.380.411.560white
845.10.390.211.70.02715.072.00.98943.50.4512.560white
855.10.420.01.80.04418.088.00.991573.680.7313.671red
865.10.420.011.50.01725.0102.00.98943.380.3612.371white
875.10.470.021.30.03418.044.00.99213.90.6212.860red
885.10.510.182.10.04216.0101.00.99243.460.8712.971red
895.10.520.062.70.05230.079.00.99323.320.439.350white
905.10.5850.01.70.04414.086.00.992643.560.9412.971red
915.20.1550.331.60.02813.059.00.989753.30.8411.981white
925.20.1550.331.60.02813.059.00.989753.30.8411.981white
935.20.160.340.80.02926.077.00.991553.250.5110.160white
945.20.170.270.70.0311.068.00.992183.30.419.850white
955.20.1850.221.00.0347.0123.00.992183.550.4410.1560white
965.20.20.273.20.04716.093.00.992353.440.5310.171white
975.20.210.311.70.04817.061.00.989533.240.3712.071white
985.20.220.466.20.06641.0187.00.993623.190.429.7333333333333350white
995.20.240.157.10.04332.0134.00.993783.240.489.960white
1005.20.240.453.80.02721.0128.00.9923.550.4911.281white
Rows: 1-100 | Columns: 14

Note

VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.

Next, we can initialize a LogisticRegression model:

from verticapy.machine_learning.vertica import LogisticRegression

model = LogisticRegression()

Now we can conveniently use the randomized_search_cv() function to find the K-Fold randomized search of an estimator.

from verticapy.machine_learning.model_selection import randomized_search_cv

result = randomized_search_cv(
    model,
    input_relation = data,
    X = [
        "fixed_acidity",
        "volatile_acidity",
        "citric_acid",
        "residual_sugar",
        "chlorides",
        "density",
    ],
    y = "good",
    cv = 3,
    metric = "auc",
    lmax = 5,
)
avg_score
avg_train_score
avg_time
score_std
score_train_std
10.680443554463290.67670276512658340.46110463142395020.0032065184204181110.0011242913073000722
20.56043011455027260.55961476633354750.40858070055643720.015017099680400080.007780571365247908
30.50.50.29779060681660970.00.0
40.50.50.303338845570882140.00.0
50.50.50.29463299115498860.00.0
Rows: 1-5 | Columns: 6

Note

randomized_search_cv() works almost the same as grid_search_cv(). The only difference is that the function will generate a random grid of parameters at the beginning.

See also

grid_search_cv() : Computes the k-fold grid search of an estimator.