Loading...

verticapy.machine_learning.vertica.ensemble.RandomForestRegressor#

class verticapy.machine_learning.vertica.ensemble.RandomForestRegressor(name: str = None, overwrite_model: bool = False, n_estimators: int = 10, max_features: Literal['auto', 'max'] | int = 'auto', max_leaf_nodes: int | float | Decimal = 1000000000.0, sample: float = 0.632, max_depth: int = 5, min_samples_leaf: int = 1, min_info_gain: int | float | Decimal = 0.0, nbins: int = 32)#

Creates a RandomForestRegressor object using the Vertica RF_REGRESSOR function. It is an ensemble learning method for regression that operates by constructing a multitude of decision trees at training-time and outputting a class with the mode.

Parameters#

name: str, optional

Name of the model. The model is stored in the DB.

overwrite_model: bool, optional

If set to True, training a model with the same name as an existing model overwrites the existing model.

n_estimators: int, optional

The number of trees in the forest, an integer between 1 and 1000, inclusive.

max_features: int | str, optional

The number of randomly chosen features from which to pick the best feature to split a given tree node. It can be an integer or one of the two following methods.

  • auto:

    square root of the total number of predictors.

  • max :

    number of predictors.

max_leaf_nodes: PythonNumber, optional

The maximum number of leaf nodes for a tree in the forest, an integerv between ``1 and 1e9, inclusive.

sample: float, optional

The portion of the input data set that is randomly selected for training each tree, a float between 0.0 and 1.0, inclusive.

max_depth: int, optional

aximum depth of each tree, an integer between 1 and 100, inclusive.

min_samples_leaf: int, optional

The minimum number of samples each branch must have after splitting a node, an integer between 1 and 1e6, inclusive. A split that results in remaining samples less than this value is discarded.

min_info_gain: PythonNumber, optional

The minimum threshold for including a split, a float between 0.0 and 1.0, inclusive. A split with information gain less than this threshold is discarded.

nbins: int, optional

Number of bins used to find splits in each column, where more splits leads to a longer runtime but more fine-grained, possibly better splits. Must be an integer between 2 and 1000, inclusive.

Attributes#

Many attributes are created during the fitting phase.

trees_: list of BinaryTreeRegressor

Tree models are instances of ` BinaryTreeRegressor, each possessing various attributes. For more detailed information, refer to the documentation for BinaryTreeRegressor.

features_importance_: numpy.array

The importance of features. It is calculated using the MDI (Mean Decreased Impurity). To determine the final score, VerticaPy sums the scores of each tree, normalizes them and applies an activation function to scale them. It is necessary to use the features_importance() method to compute it initially, and the computed values will be subsequently utilized for subsequent calls.

features_importance_trees_: dict of numpy.array

Each element of the array represents the feature importance of tree i. The importance of features is calculated using the MDI (Mean Decreased Impurity). It is necessary to use the features_importance() method to compute it initially, and the computed values will be subsequently utilized for subsequent calls.

n_estimators_: int

The number of model estimators.

Note

All attributes can be accessed using the get_attributes() method.

Note

Several other attributes can be accessed by using the get_vertica_attributes() method.

Examples#

The following examples provide a basic understanding of usage. For more detailed examples, please refer to the Machine Learning or the Examples section on the website.

Important

Many tree-based models inherit from the RandomForest base class, and it’s recommended to use it directly for access to a wider range of options.

Load data for machine learning#

We import verticapy:

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

For this example, we will use the winequality dataset.

import verticapy.datasets as vpd

data = vpd.load_winequality()
123
fixed_acidity
Numeric(8)
123
volatile_acidity
Numeric(9)
123
citric_acid
Numeric(8)
123
residual_sugar
Numeric(9)
123
chlorides
Float(22)
123
free_sulfur_dioxide
Numeric(9)
123
total_sulfur_dioxide
Numeric(9)
123
density
Float(22)
123
pH
Numeric(8)
123
sulphates
Numeric(8)
123
alcohol
Float(22)
123
quality
Integer
123
good
Integer
Abc
color
Varchar(20)
13.80.310.0211.10.03620.0114.00.992483.750.4412.460white
23.90.2250.44.20.0329.0118.00.9893.570.3612.881white
34.20.170.361.80.02993.0161.00.989993.650.8912.071white
44.20.2150.235.10.04164.0157.00.996883.420.448.030white
54.40.320.394.30.0331.0127.00.989043.460.3612.881white
64.40.460.12.80.02431.0111.00.988163.480.3413.160white
74.40.540.095.10.03852.097.00.990223.410.412.271white
84.50.190.210.950.03389.0159.00.993323.340.428.050white
94.60.4450.01.40.05311.0178.00.994263.790.5510.250white
104.60.520.152.10.0548.065.00.99343.90.5613.140red
114.70.1450.291.00.04235.090.00.99083.760.4911.360white
124.70.3350.141.30.03669.0168.00.992123.470.4610.550white
134.70.4550.181.90.03633.0106.00.987463.210.8314.071white
144.70.60.172.30.05817.0106.00.99323.850.612.960red
154.70.670.091.00.025.09.00.987223.30.3413.650white
164.70.7850.03.40.03623.0134.00.989813.530.9213.860white
174.80.130.321.20.04240.098.00.98983.420.6411.871white
184.80.170.282.90.0322.0111.00.99023.380.3411.371white
194.80.210.2110.20.03717.0112.00.993243.660.4812.271white
204.80.2250.381.20.07447.0130.00.991323.310.410.360white
214.80.260.2310.60.03423.0111.00.992743.460.2811.571white
224.80.290.231.10.04438.0180.00.989243.280.3411.960white
234.80.330.06.50.02834.0163.00.99373.350.619.950white
244.80.340.06.50.02833.0163.00.99393.360.619.960white
254.80.650.121.10.0134.010.00.992463.320.3613.540white
264.90.2350.2711.750.0334.0118.00.99543.070.59.460white
274.90.330.311.20.01639.0150.00.987133.330.5914.081white
284.90.3350.141.30.03669.0168.00.992123.470.4610.466666666666750white
294.90.3350.141.30.03669.0168.00.992123.470.4610.466666666666750white
304.90.3450.341.00.06832.0143.00.991383.240.410.150white
314.90.3450.341.00.06832.0143.00.991383.240.410.150white
324.90.420.02.10.04816.042.00.991543.710.7414.071red
334.90.470.171.90.03560.0148.00.989643.270.3511.560white
345.00.170.561.50.02624.0115.00.99063.480.3910.871white
355.00.20.41.90.01520.098.00.98973.370.5512.0560white
365.00.2350.2711.750.0334.0118.00.99543.070.59.460white
375.00.240.195.00.04317.0101.00.994383.670.5710.050white
385.00.240.212.20.03931.0100.00.990983.690.6211.760white
395.00.240.341.10.03449.0158.00.987743.320.3213.171white
405.00.2550.222.70.04346.0153.00.992383.750.7611.360white
415.00.270.324.50.03258.0178.00.989563.450.3112.671white
425.00.270.324.50.03258.0178.00.989563.450.3112.671white
435.00.270.41.20.07642.0124.00.992043.320.4710.160white
445.00.290.545.70.03554.0155.00.989763.270.3412.981white
455.00.30.333.70.0354.0173.00.98873.360.313.071white
465.00.310.06.40.04643.0166.00.9943.30.639.960white
475.00.330.161.50.04910.097.00.99173.480.4410.760white
485.00.330.161.50.04910.097.00.99173.480.4410.760white
495.00.330.161.50.04910.097.00.99173.480.4410.760white
505.00.330.184.60.03240.0124.00.991143.180.411.060white
515.00.330.2311.80.0323.0158.00.993223.410.6411.860white
525.00.350.257.80.03124.0116.00.992413.390.411.360white
535.00.350.257.80.03124.0116.00.992413.390.411.360white
545.00.380.011.60.04826.060.00.990843.70.7514.060red
555.00.40.54.30.04629.080.00.99023.490.6613.660red
565.00.420.242.00.0619.050.00.99173.720.7414.081red
575.00.440.0418.60.03938.0128.00.99853.370.5710.260white
585.00.4550.181.90.03633.0106.00.987463.210.8314.071white
595.00.550.148.30.03235.0164.00.99183.530.5112.581white
605.00.610.121.30.00965.0100.00.98743.260.3713.550white
615.00.740.01.20.04116.046.00.992584.010.5912.560red
625.01.020.041.40.04541.085.00.99383.750.4810.540red
635.01.040.241.60.0532.096.00.99343.740.6211.550red
645.10.110.321.60.02812.090.00.990083.570.5212.260white
655.10.140.250.70.03915.089.00.99193.220.439.260white
665.10.1650.225.70.04742.0146.00.99343.180.559.960white
675.10.210.281.40.04748.0148.00.991683.50.4910.450white
685.10.230.181.00.05313.099.00.989563.220.3911.550white
695.10.250.361.30.03540.078.00.98913.230.6412.171white
705.10.260.331.10.02746.0113.00.989463.350.4311.471white
715.10.260.346.40.03426.099.00.994493.230.419.260white
725.10.290.288.30.02627.0107.00.993083.360.3711.060white
735.10.290.288.30.02627.0107.00.993083.360.3711.060white
745.10.30.32.30.04840.0150.00.989443.290.4612.260white
755.10.3050.131.750.03617.073.00.993.40.5112.333333333333350white
765.10.310.30.90.03728.0152.00.9923.540.5610.160white
775.10.330.221.60.02718.089.00.98933.510.3812.571white
785.10.330.221.60.02718.089.00.98933.510.3812.571white
795.10.330.221.60.02718.089.00.98933.510.3812.571white
805.10.330.276.70.02244.0129.00.992213.360.3911.071white
815.10.350.266.80.03436.0120.00.991883.380.411.560white
825.10.350.266.80.03436.0120.00.991883.380.411.560white
835.10.350.266.80.03436.0120.00.991883.380.411.560white
845.10.390.211.70.02715.072.00.98943.50.4512.560white
855.10.420.01.80.04418.088.00.991573.680.7313.671red
865.10.420.011.50.01725.0102.00.98943.380.3612.371white
875.10.470.021.30.03418.044.00.99213.90.6212.860red
885.10.510.182.10.04216.0101.00.99243.460.8712.971red
895.10.520.062.70.05230.079.00.99323.320.439.350white
905.10.5850.01.70.04414.086.00.992643.560.9412.971red
915.20.1550.331.60.02813.059.00.989753.30.8411.981white
925.20.1550.331.60.02813.059.00.989753.30.8411.981white
935.20.160.340.80.02926.077.00.991553.250.5110.160white
945.20.170.270.70.0311.068.00.992183.30.419.850white
955.20.1850.221.00.0347.0123.00.992183.550.4410.1560white
965.20.20.273.20.04716.093.00.992353.440.5310.171white
975.20.210.311.70.04817.061.00.989533.240.3712.071white
985.20.220.466.20.06641.0187.00.993623.190.429.7333333333333350white
995.20.240.157.10.04332.0134.00.993783.240.489.960white
1005.20.240.453.80.02721.0128.00.9923.550.4911.281white
Rows: 1-100 | Columns: 14

Note

VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.

You can easily divide your dataset into training and testing subsets using the vDataFrame.train_test_split() method. This is a crucial step when preparing your data for machine learning, as it allows you to evaluate the performance of your models accurately.

data = vpd.load_winequality()
train, test = data.train_test_split(test_size = 0.2)

Warning

In this case, VerticaPy utilizes seeded randomization to guarantee the reproducibility of your data split. However, please be aware that this approach may lead to reduced performance. For a more efficient data split, you can use the vDataFrame.to_db() method to save your results into tables or temporary tables. This will help enhance the overall performance of the process.

Model Initialization#

First we import the RandomForestRegressor model:

from verticapy.machine_learning.vertica import RandomForestRegressor

Then we can create the model:

model = RandomForestRegressor(
    max_features = "auto",
    max_leaf_nodes = 32,
    sample = 0.5,
    max_depth = 3,
    min_samples_leaf = 5,
    min_info_gain = 0.0,
    nbins = 32,
)

Hint

In verticapy 1.0.x and higher, you do not need to specify the model name, as the name is automatically assigned. If you need to re-use the model, you can fetch the model name from the model’s attributes.

Important

The model name is crucial for the model management system and versioning. It’s highly recommended to provide a name if you plan to reuse the model later.

Model Training#

We can now fit the model:

model.fit(
    train,
    [
        "fixed_acidity",
        "volatile_acidity",
        "citric_acid",
        "residual_sugar",
        "chlorides",
        "density",
    ],
    "quality",
    test,
)

Important

To train a model, you can directly use the vDataFrame or the name of the relation stored in the database. The test set is optional and is only used to compute the test metrics. In verticapy, we don’t work using X matrices and y vectors. Instead, we work directly with lists of predictors and the response name.

Features Importance#

We can conveniently get the features importance:

result = model.features_importance()

Note

In models such as RandomForest, feature importance is calculated using the MDI (Mean Decreased Impurity). To determine the final score, VerticaPy sums the scores of each tree, normalizes them and applies an activation function to scale them.

Metrics#

We can get the entire report using:

model.report()
value
explained_variance0.184936557256948
max_error3.11698850484233
median_absolute_error0.505137356736668
mean_absolute_error0.616898387532274
mean_squared_error0.607119613955618
root_mean_squared_error0.779178807434865
r20.184191932728604
r2_adj0.180403350372855
aic-634.076590622121
bic-598.053805106511
Rows: 1-10 | Columns: 2

Important

Most metrics are computed using a single SQL query, but some of them might require multiple SQL queries. Selecting only the necessary metrics in the report can help optimize performance. E.g. model.report(metrics = ["mse", "r2"]).

You can utilize the score() function to calculate various regression metrics, with the R-squared being the default.

model.score()
Out[4]: 0.184191932728604

Prediction#

Prediction is straight-forward:

model.predict(
    test,
    [
        "fixed_acidity",
        "volatile_acidity",
        "citric_acid",
        "residual_sugar",
        "chlorides",
        "density",
    ],
    "prediction",
)
123
fixed_acidity
Numeric(8)
123
volatile_acidity
Numeric(9)
123
citric_acid
Numeric(8)
123
residual_sugar
Numeric(9)
123
chlorides
Float(22)
123
free_sulfur_dioxide
Numeric(9)
123
total_sulfur_dioxide
Numeric(9)
123
density
Float(22)
123
pH
Numeric(8)
123
sulphates
Numeric(8)
123
alcohol
Float(22)
123
quality
Integer
123
good
Integer
Abc
color
Varchar(20)
123
prediction
Float(22)
14.20.2150.235.10.04164.0157.00.996883.420.448.030white5.74321094116919
24.40.320.394.30.0331.0127.00.989043.460.3612.881white6.55700778498247
34.70.7850.03.40.03623.0134.00.989813.530.9213.860white6.34551649276959
44.90.3450.341.00.06832.0143.00.991383.240.410.150white6.11416135093136
55.00.330.161.50.04910.097.00.99173.480.4410.760white6.00703985740167
65.00.330.2311.80.0323.0158.00.993223.410.6411.860white5.67022612847596
75.00.350.257.80.03124.0116.00.992413.390.411.360white5.67022612847596
85.00.440.0418.60.03938.0128.00.99853.370.5710.260white5.43559312504722
95.00.550.148.30.03235.0164.00.99183.530.5112.581white6.19566273786577
105.01.020.041.40.04541.085.00.99383.750.4810.540red5.4808090781821
115.10.330.276.70.02244.0129.00.992213.360.3911.071white5.85248114507437
125.10.350.266.80.03436.0120.00.991883.380.411.560white6.44991888795979
135.10.390.211.70.02715.072.00.98943.50.4512.560white6.47003675423691
145.20.1550.331.60.02813.059.00.989753.30.8411.981white6.41038454344281
155.20.160.340.80.02926.077.00.991553.250.5110.160white6.21065063266623
165.20.210.311.70.04817.061.00.989533.240.3712.071white6.28225817007084
175.20.240.453.80.02721.0128.00.9923.550.4911.281white5.88355423914942
185.20.240.453.80.02721.0128.00.9923.550.4911.281white5.88355423914942
195.20.250.231.40.04720.077.00.990013.320.6211.450white6.21790153442229
205.20.320.251.80.10313.050.00.99573.380.559.250red5.41668991943923
215.20.3350.21.70.03317.074.00.990023.340.4812.360white6.47003675423691
225.20.340.01.80.0527.063.00.99163.680.7914.060red6.00703985740167
235.20.440.041.40.03643.0119.00.98943.360.3312.181white6.47003675423691
245.30.160.391.00.02840.0101.00.991563.570.5910.660white6.18314576454798
255.30.230.560.90.04146.0141.00.991193.160.629.750white6.11698850484233
265.30.30.164.20.02937.0100.00.99053.30.3611.881white6.34273212481501
275.30.30.21.10.07748.0166.00.99443.30.548.740white5.50498436757065
285.30.30.31.20.02925.093.00.987423.310.413.671white6.47252968516195
295.30.330.31.20.04825.0119.00.990453.320.6211.360white6.11416135093136
305.30.5850.077.10.04434.0145.00.99453.340.579.760white5.4808090781821
315.30.7150.191.50.1617.062.00.993953.620.6111.050red5.43011182222168
325.40.230.361.50.0374.0121.00.989763.240.9912.171white6.41038454344281
335.40.290.381.20.02931.0132.00.988953.280.3612.460white6.47252968516195
345.40.330.314.00.0327.0108.00.990313.30.4312.271white6.55700778498247
355.50.120.331.00.03823.0131.00.991643.250.459.850white6.21065063266623
365.50.160.261.50.03235.0100.00.990763.430.7712.060white6.21065063266623
375.50.160.311.20.02631.068.00.98983.330.4411.6560white6.41038454344281
385.50.290.31.10.02220.0110.00.988693.340.3812.871white6.47252968516195
395.50.320.454.90.02825.0191.00.99223.510.4911.571white5.83683397888135
405.50.3750.381.70.03617.098.00.991423.290.3910.560white6.18314576454798
415.60.120.264.30.03818.097.00.994773.360.469.250white5.93436005443988
425.60.120.332.90.04421.073.00.988963.170.3212.981white6.49486264326333
435.60.1850.491.10.0328.0117.00.99183.550.4510.360white6.11698850484233
445.60.2050.1612.550.05131.0115.00.995643.40.3810.860white5.66789606685455
455.60.2450.321.10.04724.0152.00.99273.120.429.360white5.79582334916065
465.60.260.181.40.03418.0135.00.991743.320.3510.260white6.16539284386006
475.60.280.284.20.04452.0158.00.9923.350.4410.771white5.85248114507437
485.60.280.46.10.03436.0118.00.991443.210.4312.171white6.42241401984153
495.60.320.337.40.03725.095.00.992683.250.4911.160white5.85248114507437
505.60.410.241.90.03410.053.00.988153.320.513.571white6.47003675423691
515.60.540.041.70.0495.013.00.99423.720.5811.450red5.46092972822366
525.60.6150.01.60.08916.059.00.99433.580.529.950red5.43011182222168
535.70.1350.34.60.04219.0101.00.99463.310.429.360white5.93436005443988
545.70.140.35.40.04526.0105.00.994693.320.459.350white5.93436005443988
555.70.220.251.10.0597.0175.00.990993.440.6211.160white6.00703985740167
565.70.220.293.50.0427.0146.00.989993.170.3612.160white6.49486264326333
575.70.240.31.30.0325.098.00.989683.370.4312.471white6.41038454344281
585.70.250.2612.50.04952.5120.00.996913.080.459.460white5.74476246518232
595.70.2550.651.20.07917.0137.00.993073.20.429.450white5.78017618296763
605.70.260.2510.40.027.057.00.9943.390.3710.650white5.68317016428656
615.70.270.321.20.04620.0155.00.99343.80.4110.260white5.85248114507437
625.70.320.181.40.02926.0104.00.99063.440.3711.060white6.16539284386006
635.70.320.52.60.04917.0155.00.99273.220.6410.060white5.73345592269956
645.70.360.344.20.02621.077.00.99073.410.4511.960white6.38798991362118
655.70.370.31.10.02924.088.00.988833.180.3911.760white6.47252968516195
665.70.40.355.10.02617.0113.00.990523.180.6712.460white6.44991888795979
675.70.40.355.10.02617.0113.00.990523.180.6712.460white6.44991888795979
685.70.440.137.00.02528.0173.00.99133.330.4812.560white6.40466109915361
695.70.450.421.10.05161.0197.00.99323.020.49.050white5.73345592269956
705.70.60.01.40.06311.018.00.991913.450.5612.260red5.89303147520921
715.80.130.2212.70.05824.0183.00.99563.320.4211.760white5.66789606685455
725.80.170.341.80.04596.0170.00.990353.380.911.881white6.41038454344281
735.80.170.341.80.04596.0170.00.990353.380.911.881white6.41038454344281
745.80.180.371.20.03619.074.00.988533.090.4912.771white6.41038454344281
755.80.190.241.30.04438.0128.00.993623.770.610.650white5.84158333335787
765.80.190.2510.80.04233.0124.00.996463.220.419.260white5.74321094116919
775.80.20.161.40.04244.099.00.989123.230.3712.260white6.40789161251777
785.80.20.241.40.03365.0169.00.990433.590.5612.371white6.16539284386006
795.80.20.341.00.03540.086.00.989933.50.4211.750white6.41038454344281
805.80.220.251.50.02421.0109.00.992343.370.5810.460white5.89974654917103
815.80.230.271.80.04324.069.00.99333.380.319.460white5.89920140534244
825.80.230.314.50.04642.0124.00.993243.310.6410.860white5.89920140534244
835.80.260.181.20.03140.0114.00.99083.420.411.071white6.16539284386006
845.80.260.32.60.03475.0129.00.99023.20.3811.540white6.41038454344281
855.80.270.214.950.04422.0179.00.99623.370.3710.250white5.43559312504722
865.80.270.263.50.07126.069.00.989943.10.3811.560white6.4288814116105
875.80.2750.35.40.04341.0149.00.99263.330.4210.871white5.85248114507437
885.80.280.284.20.04452.0158.00.9923.350.4410.771white5.85248114507437
895.80.280.31.50.02631.0114.00.989523.320.612.571white6.47252968516195
905.80.280.669.10.03926.0159.00.99653.660.5510.850white5.65385291811907
915.80.280.669.10.03926.0159.00.99653.660.5510.850white5.65385291811907
925.80.290.333.70.02930.088.00.989943.250.4212.360white6.55700778498247
935.80.30.271.70.01445.0104.00.989143.40.5612.671white6.47252968516195
945.80.310.331.20.03623.099.00.99163.180.610.560white6.21065063266623
955.80.340.167.00.03726.0116.00.99493.460.4510.071white5.55565124395615
965.80.3450.1510.80.03326.0120.00.994943.250.4910.060white5.55565124395615
975.80.360.263.30.03840.0153.00.99113.340.5511.360white6.38798991362118
985.80.360.51.00.12763.0178.00.992123.10.459.750white5.73345592269956
995.80.390.477.50.02712.088.00.99073.380.4514.060white6.35625676013589
1005.80.610.018.40.04131.0104.00.99093.260.7214.0571white6.19566273786577
Rows: 1-100 | Columns: 15

Note

Predictions can be made automatically using the test set, in which case you don’t need to specify the predictors. Alternatively, you can pass only the vDataFrame to the predict() function, but in this case, it’s essential that the column names of the vDataFrame match the predictors and response name in the model.

Plots#

Tree models can be visualized by drawing their tree plots. For more examples, check out Machine Learning - Tree Plots.

model.plot_tree()
../_images/machine_learning_vertica_rfreg.png

Note

The above example may not render properly in the doc because of the huge size of the tree. But it should render nicely in jupyter environment.

In order to plot graph using graphviz separately, you can extract the graphviz DOT file code as follows:

model.to_graphviz()
Out[5]: 'digraph Tree {\ngraph [bgcolor="#FFFFFF00"];\n0 [label="\\"density\\"", shape="box", style="filled", fillcolor="#FFFFFF00", fontcolor="#666666", color="#666666"]\n0 -> 1 [label="<= 0.991973", color="#666666", fontcolor="#666666"]\n0 -> 2 [label="> 0.991973", color="#666666", fontcolor="#666666"]\n1 [label="\\"density\\"", shape="box", style="filled", fillcolor="#FFFFFF00", fontcolor="#666666", color="#666666"]\n1 -> 3 [label="<= 0.990352", color="#666666", fontcolor="#666666"]\n1 -> 4 [label="> 0.990352", color="#666666", fontcolor="#666666"]\n2 [label="\\"density\\"", shape="box", style="filled", fillcolor="#FFFFFF00", fontcolor="#666666", color="#666666"]\n2 -> 5 [label="<= 0.995215", color="#666666", fontcolor="#666666"]\n2 -> 6 [label="> 0.995215", color="#666666", fontcolor="#666666"]\n3 [label="6.568075", fillcolor="#FFFFFF00", fontcolor="#666666", shape="none", color="#666666"]\n4 [label="\\"citric_acid\\"", shape="box", style="filled", fillcolor="#FFFFFF00", fontcolor="#666666", color="#666666"]\n4 -> 7 [label="<= 0.363125", color="#666666", fontcolor="#666666"]\n4 -> 8 [label="> 0.363125", color="#666666", fontcolor="#666666"]\n5 [label="\\"volatile_acidity\\"", shape="box", style="filled", fillcolor="#FFFFFF00", fontcolor="#666666", color="#666666"]\n5 -> 9 [label="<= 0.455", color="#666666", fontcolor="#666666"]\n5 -> 10 [label="> 0.455", color="#666666", fontcolor="#666666"]\n6 [label="\\"citric_acid\\"", shape="box", style="filled", fillcolor="#FFFFFF00", fontcolor="#666666", color="#666666"]\n6 -> 11 [label="<= 0.259375", color="#666666", fontcolor="#666666"]\n6 -> 12 [label="> 0.259375", color="#666666", fontcolor="#666666"]\n7 [label="6.334572", fillcolor="#FFFFFF00", fontcolor="#666666", shape="none", color="#666666"]\n8 [label="6.059524", fillcolor="#FFFFFF00", fontcolor="#666666", shape="none", color="#666666"]\n9 [label="5.89272", fillcolor="#FFFFFF00", fontcolor="#666666", shape="none", color="#666666"]\n10 [label="5.452174", fillcolor="#FFFFFF00", fontcolor="#666666", shape="none", color="#666666"]\n11 [label="5.373656", fillcolor="#FFFFFF00", fontcolor="#666666", shape="none", color="#666666"]\n12 [label="5.662469", fillcolor="#FFFFFF00", fontcolor="#666666", shape="none", color="#666666"]\n}'

This string can then be copied into a DOT file which can beparsed by graphviz.

Contour plot is another useful plot that can be produced for models with two predictors.

model.contour()

Important

Machine learning models with two predictors can usually benefit from their own contour plot. This visual representation aids in exploring predictions and gaining a deeper understanding of how these models perform in different scenarios. Please refer to Contour Plot for more examples.

Model Register#

In order to register the model for tracking and versioning:

model.register("model_v1")

Please refer to Model Tracking and Versioning for more details on model tracking and versioning.

Model Exporting#

To Memmodel

model.to_memmodel()

Note

MemModel objects serve as in-memory representations of machine learning models. They can be used for both in-database and in-memory prediction tasks. These objects can be pickled in the same way that you would pickle a scikit-learn model.

The following methods for exporting the model use MemModel, and it is recommended to use MemModel directly.

To SQL

You can get the SQL code by:

model.to_sql()
Out[6]: '((CASE WHEN "density" < 0.991973 THEN (CASE WHEN "density" < 0.990352 THEN 6.568075 ELSE (CASE WHEN "citric_acid" < 0.363125 THEN 6.334572 ELSE 6.059524 END) END) ELSE (CASE WHEN "density" < 0.995215 THEN (CASE WHEN "volatile_acidity" < 0.455 THEN 5.89272 ELSE 5.452174 END) ELSE (CASE WHEN "citric_acid" < 0.259375 THEN 5.373656 ELSE 5.662469 END) END) END) + (CASE WHEN "chlorides" < 0.046625 THEN (CASE WHEN "density" < 0.991973 THEN (CASE WHEN "density" < 0.990352 THEN 6.562162 ELSE 6.245791 END) ELSE (CASE WHEN "density" < 0.995215 THEN 5.950855 ELSE 5.633218 END) END) ELSE (CASE WHEN "volatile_acidity" < 0.54875 THEN (CASE WHEN "citric_acid" < 0.259375 THEN 5.465863 ELSE 5.778739 END) ELSE (CASE WHEN "density" < 1.001698 THEN 5.24359 ELSE 6.2 END) END) END) + (CASE WHEN "density" < 0.991973 THEN (CASE WHEN "residual_sugar" < 4.675 THEN (CASE WHEN "density" < 0.990352 THEN 6.480226 ELSE 6.136808 END) ELSE 6.756098 END) ELSE (CASE WHEN "volatile_acidity" < 0.2675 THEN (CASE WHEN "density" < 1.000078 THEN 5.942701 ELSE 5.375 END) ELSE (CASE WHEN "citric_acid" < 0.259375 THEN 5.370019 ELSE 5.613889 END) END) END) + (CASE WHEN "volatile_acidity" < 0.54875 THEN (CASE WHEN "chlorides" < 0.046625 THEN (CASE WHEN "density" < 0.993594 THEN 6.264507 ELSE 5.682875 END) ELSE (CASE WHEN "citric_acid" < 0.259375 THEN 5.460905 ELSE 5.766667 END) END) ELSE (CASE WHEN "density" < 0.995215 THEN (CASE WHEN "fixed_acidity" < 6.825 THEN 5.375 ELSE 5.904762 END) ELSE (CASE WHEN "chlorides" < 0.140688 THEN 5.285088 ELSE 5.7 END) END) END) + (CASE WHEN "density" < 0.991973 THEN (CASE WHEN "density" < 0.990352 THEN (CASE WHEN "fixed_acidity" < 7.58125 THEN 6.536082 ELSE 5.714286 END) ELSE (CASE WHEN "residual_sugar" < 2.6375 THEN 6.178571 ELSE 6.51145 END) END) ELSE (CASE WHEN "volatile_acidity" < 0.2675 THEN (CASE WHEN "volatile_acidity" < 0.220625 THEN 6.051919 ELSE 5.769737 END) ELSE (CASE WHEN "density" < 0.995215 THEN 5.631347 ELSE 5.414489 END) END) END) + (CASE WHEN "density" < 0.991973 THEN (CASE WHEN "density" < 0.990352 THEN (CASE WHEN "volatile_acidity" < 0.2675 THEN 6.302326 ELSE 6.585938 END) ELSE (CASE WHEN "residual_sugar" < 2.6375 THEN 6.085586 ELSE 6.4 END) END) ELSE (CASE WHEN "chlorides" < 0.046625 THEN (CASE WHEN "citric_acid" < 0.2075 THEN 5.355263 ELSE 5.91938 END) ELSE (CASE WHEN "fixed_acidity" < 9.85 THEN 5.555556 ELSE 5.870968 END) END) END) + (CASE WHEN "density" < 0.991973 THEN (CASE WHEN "density" < 0.990352 THEN (CASE WHEN "residual_sugar" < 2.6375 THEN 6.403101 ELSE 6.87931 END) ELSE (CASE WHEN "residual_sugar" < 2.6375 THEN 6.128319 ELSE 6.464912 END) END) ELSE (CASE WHEN "density" < 0.995215 THEN (CASE WHEN "citric_acid" < 0.259375 THEN 5.514523 ELSE 5.978261 END) ELSE (CASE WHEN "citric_acid" < 0.259375 THEN 5.3675 ELSE 5.697904 END) END) END) + (CASE WHEN "citric_acid" < 0.259375 THEN (CASE WHEN "density" < 0.991973 THEN (CASE WHEN "density" < 0.990352 THEN 6.372093 ELSE 5.944444 END) ELSE (CASE WHEN "volatile_acidity" < 1.064375 THEN 5.427892 ELSE 4.0 END) END) ELSE (CASE WHEN "citric_acid" < 0.415 THEN (CASE WHEN "density" < 0.991973 THEN 6.397022 ELSE 5.891921 END) ELSE (CASE WHEN "residual_sugar" < 8.75 THEN 5.73545 ELSE 5.532374 END) END) END) + (CASE WHEN "density" < 0.991973 THEN (CASE WHEN "volatile_acidity" < 0.501875 THEN (CASE WHEN "residual_sugar" < 2.6375 THEN 6.248571 ELSE 6.617143 END) ELSE 5.416667 END) ELSE (CASE WHEN "volatile_acidity" < 0.220625 THEN (CASE WHEN "fixed_acidity" < 6.446875 THEN 5.908257 ELSE 6.106267 END) ELSE (CASE WHEN "citric_acid" < 0.259375 THEN 5.360714 ELSE 5.686627 END) END) END) + (CASE WHEN "density" < 0.991973 THEN (CASE WHEN "density" < 0.990352 THEN (CASE WHEN "volatile_acidity" < 0.2675 THEN 6.341772 ELSE 6.679612 END) ELSE (CASE WHEN "residual_sugar" < 2.6375 THEN 6.086758 ELSE 6.507692 END) END) ELSE (CASE WHEN "volatile_acidity" < 0.220625 THEN (CASE WHEN "fixed_acidity" < 9.09375 THEN 6.124711 ELSE 4.9 END) ELSE (CASE WHEN "citric_acid" < 0.259375 THEN 5.370304 ELSE 5.695305 END) END) END)) / 10'

To Python

To obtain the prediction function in Python syntax, use the following code:

X = [[4.2, 0.17, 0.36, 1.8, 0.029, 0.9899]]

model.to_python()(X)
Out[8]: array([6.4103844])

Hint

The to_python() method is used to retrieve predictions, probabilities, or cluster distances. For specific details on how to use this method for different model types, refer to the relevant documentation for each model.

__init__(name: str = None, overwrite_model: bool = False, n_estimators: int = 10, max_features: Literal['auto', 'max'] | int = 'auto', max_leaf_nodes: int | float | Decimal = 1000000000.0, sample: float = 0.632, max_depth: int = 5, min_samples_leaf: int = 1, min_info_gain: int | float | Decimal = 0.0, nbins: int = 32) None#

Must be overridden in the child class

Methods

__init__([name, overwrite_model, ...])

Must be overridden in the child class

contour([nbins, chart])

Draws the model's contour plot.

deploySQL([X])

Returns the SQL code needed to deploy the model.

does_model_exists(name[, raise_error, ...])

Checks whether the model is stored in the Vertica database.

drop()

Drops the model from the Vertica database.

export_models(name, path[, kind])

Exports machine learning models.

features_importance([tree_id, show, chart])

Computes the model's features importance.

fit(input_relation, X, y[, test_relation, ...])

Trains the model.

get_attributes([attr_name])

Returns the model attributes.

get_match_index(x, col_list[, str_check])

Returns the matching index.

get_params()

Returns the parameters of the model.

get_plotting_lib([class_name, chart, ...])

Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.

get_score([tree_id])

Returns the feature importance metrics for the input tree.

get_tree([tree_id])

Returns a table with all the input tree information.

get_vertica_attributes([attr_name])

Returns the model Vertica attributes.

import_models(path[, schema, kind])

Imports machine learning models.

plot([max_nb_points, chart])

Draws the model.

plot_tree([tree_id, pic_path])

Draws the input tree.

predict(vdf[, X, name, inplace])

Predicts using the input relation.

register(registered_name[, raise_error])

Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.

regression_report([metrics])

Computes a regression report

report([metrics])

Computes a regression report

score([metric])

Computes the model score.

set_params([parameters])

Sets the parameters of the model.

summarize()

Summarizes the model.

to_binary(path)

Exports the model to the Vertica Binary format.

to_graphviz([tree_id, classes_color, ...])

Returns the code for a Graphviz tree.

to_memmodel()

Converts the model to an InMemory object that can be used for different types of predictions.

to_pmml(path)

Exports the model to PMML.

to_python([return_proba, ...])

Returns the Python function needed for in-memory scoring without using built-in Vertica functions.

to_sql([X, return_proba, ...])

Returns the SQL code needed to deploy the model without using built-in Vertica functions.

to_tf(path)

Exports the model to the Frozen Graph format (TensorFlow).

Attributes