Wine Quality

This example uses the Wine Quality dataset to predict the quality of white wine. You can download the Jupyter Notebook of the study here.

  • fixed acidity
  • volatile acidity
  • citric acid
  • residual sugar
  • total sulfur dioxide
  • free sulfur dioxide
  • density
  • pH
  • sulphates
  • alcohol
  • quality (score between 0 and 10)

We will follow the data science cycle (Data Exploration - Data Preparation - Data Modeling - Model Evaluation - Model Deployment) to solve this problem.

Initialization

This example uses the following version of VerticaPy:

In [10]:
import verticapy as vp
vp.__version__
Out[10]:
'0.9.0'

Connect to Vertica. This example uses an existing connection called "VerticaDSN." For details on how to create a connection, use see the connection tutorial.

In [2]:
vp.connect("VerticaDSN")

Let's create a Virtual DataFrame of the dataset.

In [3]:
from verticapy.datasets import load_winequality
winequality = load_winequality()
winequality.head(5)
Out[3]:
123
fixed_acidity
Numeric(6,3)
123
volatile_acidity
Numeric(7,4)
123
citric_acid
Numeric(6,3)
123
residual_sugar
Numeric(7,3)
123
chlorides
Float
123
free_sulfur_dioxide
Numeric(7,2)
123
total_sulfur_dioxide
Numeric(7,2)
123
density
Float
123
pH
Numeric(6,3)
123
sulphates
Numeric(6,3)
123
alcohol
Float
123
quality
Int
123
good
Int
Abc
color
Varchar(20)
13.80.310.0211.10.03620.0114.00.992483.750.4412.460white
23.90.2250.44.20.0329.0118.00.9893.570.3612.881white
34.20.170.361.80.02993.0161.00.989993.650.8912.071white
44.20.2150.235.10.04164.0157.00.996883.420.448.030white
54.40.320.394.30.0331.0127.00.989043.460.3612.881white
Rows: 1-5 | Columns: 14

Data Exploration and Preparation

Let's explore the data by displaying descriptive statistics of all the columns.

In [4]:
winequality.describe()
Out[4]:
count
mean
std
min
approx_25%
approx_50%
approx_75%
max
"fixed_acidity"64977.215307064799141.296433757799823.86.47.07.715.9
"volatile_acidity"64970.3396659996921650.1646364740846790.080.230.290.41.58
"citric_acid"64970.3186332153301530.1453178648975910.00.250.310.391.66
"residual_sugar"64975.443235339387424.757803743147410.61.83.08.165.8
"chlorides"64970.05603386178236110.03503360137245910.0090.0380.0470.0650.611
"free_sulfur_dioxide"649730.525319378174617.74939977200251.017.029.041.0289.0
"total_sulfur_dioxide"6497115.74457441896356.52185452263046.077.0118.0156.0440.0
"density"64970.9946966338310.002998673003719150.987110.992340.994890.996991.03898
"pH"64973.218500846544560.1607872021039872.723.113.213.324.01
"sulphates"64970.5312682776666140.148805873614490.220.430.510.62.0
"alcohol"649710.49180083115291.1927117488718.09.510.311.314.9
"quality"64975.818377712790510.8732552715311233.05.06.06.09.0
"good"64970.196552254886870.3974214088953670.00.00.00.01.0
Rows: 1-13 | Columns: 9

The quality of a wine is based on the equilibrium between certain components:

  • For red wines: tannin/smoothness/acidity
  • For white wines: smoothness/acidity

Based on this, we don't have the data to create a good model for red wines (the tannins weren't extracted). We do, however, have enough data to make a good model for white wines, so let's filter out red wines from our study.

In [5]:
winequality.filter(winequality["color"] == 'white').drop(["good", "color"])
1599 elements were filtered
Out[5]:
123
fixed_acidity
Numeric(6,3)
123
volatile_acidity
Numeric(7,4)
123
citric_acid
Numeric(6,3)
123
residual_sugar
Numeric(7,3)
123
chlorides
Float
123
free_sulfur_dioxide
Numeric(7,2)
123
total_sulfur_dioxide
Numeric(7,2)
123
density
Float
123
pH
Numeric(6,3)
123
sulphates
Numeric(6,3)
123
alcohol
Float
123
quality
Int
13.80.310.0211.10.03620.0114.00.992483.750.4412.46
23.90.2250.44.20.0329.0118.00.9893.570.3612.88
34.20.170.361.80.02993.0161.00.989993.650.8912.07
44.20.2150.235.10.04164.0157.00.996883.420.448.03
54.40.320.394.30.0331.0127.00.989043.460.3612.88
64.40.460.12.80.02431.0111.00.988163.480.3413.16
74.40.540.095.10.03852.097.00.990223.410.412.27
84.50.190.210.950.03389.0159.00.993323.340.428.05
94.60.4450.01.40.05311.0178.00.994263.790.5510.25
104.70.1450.291.00.04235.090.00.99083.760.4911.36
114.70.3350.141.30.03669.0168.00.992123.470.4610.55
124.70.4550.181.90.03633.0106.00.987463.210.8314.07
134.70.670.091.00.025.09.00.987223.30.3413.65
144.70.7850.03.40.03623.0134.00.989813.530.9213.86
154.80.130.321.20.04240.098.00.98983.420.6411.87
164.80.170.282.90.0322.0111.00.99023.380.3411.37
174.80.210.2110.20.03717.0112.00.993243.660.4812.27
184.80.2250.381.20.07447.0130.00.991323.310.410.36
194.80.260.2310.60.03423.0111.00.992743.460.2811.57
204.80.290.231.10.04438.0180.00.989243.280.3411.96
214.80.330.06.50.02834.0163.00.99373.350.619.95
224.80.340.06.50.02833.0163.00.99393.360.619.96
234.80.650.121.10.0134.010.00.992463.320.3613.54
244.90.2350.2711.750.0334.0118.00.99543.070.59.46
254.90.330.311.20.01639.0150.00.987133.330.5914.08
264.90.3350.141.30.03669.0168.00.992123.470.4610.46666666666675
274.90.3350.141.30.03669.0168.00.992123.470.4610.46666666666675
284.90.3450.341.00.06832.0143.00.991383.240.410.15
294.90.3450.341.00.06832.0143.00.991383.240.410.15
304.90.470.171.90.03560.0148.00.989643.270.3511.56
315.00.170.561.50.02624.0115.00.99063.480.3910.87
325.00.20.41.90.01520.098.00.98973.370.5512.056
335.00.2350.2711.750.0334.0118.00.99543.070.59.46
345.00.240.195.00.04317.0101.00.994383.670.5710.05
355.00.240.212.20.03931.0100.00.990983.690.6211.76
365.00.240.341.10.03449.0158.00.987743.320.3213.17
375.00.2550.222.70.04346.0153.00.992383.750.7611.36
385.00.270.324.50.03258.0178.00.989563.450.3112.67
395.00.270.324.50.03258.0178.00.989563.450.3112.67
405.00.270.41.20.07642.0124.00.992043.320.4710.16
415.00.290.545.70.03554.0155.00.989763.270.3412.98
425.00.30.333.70.0354.0173.00.98873.360.313.07
435.00.310.06.40.04643.0166.00.9943.30.639.96
445.00.330.161.50.04910.097.00.99173.480.4410.76
455.00.330.161.50.04910.097.00.99173.480.4410.76
465.00.330.161.50.04910.097.00.99173.480.4410.76
475.00.330.184.60.03240.0124.00.991143.180.411.06
485.00.330.2311.80.0323.0158.00.993223.410.6411.86
495.00.350.257.80.03124.0116.00.992413.390.411.36
505.00.350.257.80.03124.0116.00.992413.390.411.36
515.00.440.0418.60.03938.0128.00.99853.370.5710.26
525.00.4550.181.90.03633.0106.00.987463.210.8314.07
535.00.550.148.30.03235.0164.00.99183.530.5112.58
545.00.610.121.30.00965.0100.00.98743.260.3713.55
555.10.110.321.60.02812.090.00.990083.570.5212.26
565.10.140.250.70.03915.089.00.99193.220.439.26
575.10.1650.225.70.04742.0146.00.99343.180.559.96
585.10.210.281.40.04748.0148.00.991683.50.4910.45
595.10.230.181.00.05313.099.00.989563.220.3911.55
605.10.250.361.30.03540.078.00.98913.230.6412.17
615.10.260.331.10.02746.0113.00.989463.350.4311.47
625.10.260.346.40.03426.099.00.994493.230.419.26
635.10.290.288.30.02627.0107.00.993083.360.3711.06
645.10.290.288.30.02627.0107.00.993083.360.3711.06
655.10.30.32.30.04840.0150.00.989443.290.4612.26
665.10.3050.131.750.03617.073.00.993.40.5112.33333333333335
675.10.310.30.90.03728.0152.00.9923.540.5610.16
685.10.330.221.60.02718.089.00.98933.510.3812.57
695.10.330.221.60.02718.089.00.98933.510.3812.57
705.10.330.221.60.02718.089.00.98933.510.3812.57
715.10.330.276.70.02244.0129.00.992213.360.3911.07
725.10.350.266.80.03436.0120.00.991883.380.411.56
735.10.350.266.80.03436.0120.00.991883.380.411.56
745.10.350.266.80.03436.0120.00.991883.380.411.56
755.10.390.211.70.02715.072.00.98943.50.4512.56
765.10.420.011.50.01725.0102.00.98943.380.3612.37
775.10.520.062.70.05230.079.00.99323.320.439.35
785.20.1550.331.60.02813.059.00.989753.30.8411.98
795.20.1550.331.60.02813.059.00.989753.30.8411.98
805.20.160.340.80.02926.077.00.991553.250.5110.16
815.20.170.270.70.0311.068.00.992183.30.419.85
825.20.1850.221.00.0347.0123.00.992183.550.4410.156
835.20.20.273.20.04716.093.00.992353.440.5310.17
845.20.210.311.70.04817.061.00.989533.240.3712.07
855.20.220.466.20.06641.0187.00.993623.190.429.733333333333335
865.20.240.157.10.04332.0134.00.993783.240.489.96
875.20.240.453.80.02721.0128.00.9923.550.4911.28
885.20.240.453.80.02721.0128.00.9923.550.4911.28
895.20.250.231.40.04720.077.00.990013.320.6211.45
905.20.280.291.10.02818.069.00.991683.240.5410.06
915.20.2850.295.150.03564.0138.00.98953.190.3412.48
925.20.30.341.50.03818.096.00.989423.560.4813.08
935.20.310.22.40.02727.0117.00.988863.560.4513.07
945.20.310.365.10.03146.0145.00.98973.140.3112.47
955.20.3350.21.70.03317.074.00.990023.340.4812.36
965.20.340.376.20.03142.0133.00.990763.250.4112.56
975.20.360.021.60.03124.0104.00.98963.440.3512.26
985.20.3650.0813.50.04137.0142.00.9973.460.399.96
995.20.370.331.20.02813.081.00.99023.370.3811.76
1005.20.380.267.70.05320.0103.00.99253.270.4512.26
Rows: 1-100 of 4898 | Columns: 12

Let's draw the correlation matrix of the dataset.

In [6]:
%matplotlib inline
winequality.corr(method = "spearman")
Out[6]:
"fixed_acidity"
"volatile_acidity"
"citric_acid"
"residual_sugar"
"chlorides"
"free_sulfur_dioxide"
"total_sulfur_dioxide"
"density"
"pH"
"sulphates"
"alcohol"
"quality"
"fixed_acidity"1.0-0.04295344115546210.2943519557055970.1016444303644920.0900110347529423-0.02779264833156940.1093887187104840.265711053951591-0.415781573031458-0.0141936686269647-0.103450490979059-0.0796745702348406
"volatile_acidity"-0.04295344115546211.0-0.1454069083790830.107237277630878-0.00339782066183659-0.08338190889885730.1147485415406030.00854780369850073-0.0449435976968945-0.01743666868474510.0353025244854393-0.158558078043926
"citric_acid"0.294351955705597-0.1454069083790831.00.02489103261813680.03620891887063470.09005644858207480.09600645634609480.0946381002313568-0.1472496252419410.0813914907919214-0.03513340617580740.00381493685138116
"residual_sugar"0.1016444303644920.1072372776308780.02489103261813681.00.2246599587936940.3452300155377770.4301034543860880.779068758176437-0.180627450864797-0.00657339265069306-0.441559643430315-0.0923288244199536
"chlorides"0.0900110347529423-0.003397820661836590.03620891887063470.2246599587936941.00.1654698862453890.3733551183476690.505577326845011-0.05598968197799410.0887708062447246-0.569837905062804-0.318382425676257
"free_sulfur_dioxide"-0.0277926483315694-0.08338190889885730.09005644858207480.3452300155377770.1654698862453891.00.6189596424554670.329172049904512-0.008278266941152710.0510877316189511-0.2734204567507060.00379066722904633
"total_sulfur_dioxide"0.1093887187104840.1147485415406030.09600645634609480.4301034543860880.3733551183476690.6189596424554671.00.563857044809788-0.01306978626218150.155569043482336-0.477228700782309-0.204178237570967
"density"0.2657110539515910.008547803698500730.09463810023135680.7790687581764370.5055773268450110.3291720499045120.5638570448097881.0-0.110660332831870.0921206844814246-0.821563801759593-0.357150131425647
"pH"-0.415781573031458-0.0449435976968945-0.147249625241941-0.180627450864797-0.0559896819779941-0.00827826694115271-0.0130697862621815-0.110660332831871.00.1404081751579920.1488642082304350.113164400131828
"sulphates"-0.0141936686269647-0.01743666868474510.0813914907919214-0.006573392650693060.08877080624472460.05108773161895110.1555690434823360.09212068448142460.1404081751579921.0-0.04135567739912990.0303598456826122
"alcohol"-0.1034504909790590.0353025244854393-0.0351334061758074-0.441559643430315-0.569837905062804-0.273420456750706-0.477228700782309-0.8215638017595930.148864208230435-0.04135567739912991.00.444900539842434
"quality"-0.0796745702348406-0.1585580780439260.00381493685138116-0.0923288244199536-0.3183824256762570.00379066722904633-0.204178237570967-0.3571501314256470.1131644001318280.03035984568261220.4449005398424341.0
Rows: 1-12 | Columns: 13

We can see a strong correlation between the density and the alcohol degree (the alcohol degree describes the density of pure ethanol in the wine). We can drop the 'density' column since it doesn't influence the quality of the white wine (instead, its presence will just bias the data).

In [7]:
winequality.drop(["density"])
Out[7]:
123
fixed_acidity
Numeric(6,3)
123
volatile_acidity
Numeric(7,4)
123
citric_acid
Numeric(6,3)
123
residual_sugar
Numeric(7,3)
123
chlorides
Float
123
free_sulfur_dioxide
Numeric(7,2)
123
total_sulfur_dioxide
Numeric(7,2)
123
pH
Numeric(6,3)
123
sulphates
Numeric(6,3)
123
alcohol
Float
123
quality
Int
13.80.310.0211.10.03620.0114.03.750.4412.46
23.90.2250.44.20.0329.0118.03.570.3612.88
34.20.170.361.80.02993.0161.03.650.8912.07
44.20.2150.235.10.04164.0157.03.420.448.03
54.40.320.394.30.0331.0127.03.460.3612.88
64.40.460.12.80.02431.0111.03.480.3413.16
74.40.540.095.10.03852.097.03.410.412.27
84.50.190.210.950.03389.0159.03.340.428.05
94.60.4450.01.40.05311.0178.03.790.5510.25
104.70.1450.291.00.04235.090.03.760.4911.36
114.70.3350.141.30.03669.0168.03.470.4610.55
124.70.4550.181.90.03633.0106.03.210.8314.07
134.70.670.091.00.025.09.03.30.3413.65
144.70.7850.03.40.03623.0134.03.530.9213.86
154.80.130.321.20.04240.098.03.420.6411.87
164.80.170.282.90.0322.0111.03.380.3411.37
174.80.210.2110.20.03717.0112.03.660.4812.27
184.80.2250.381.20.07447.0130.03.310.410.36
194.80.260.2310.60.03423.0111.03.460.2811.57
204.80.290.231.10.04438.0180.03.280.3411.96
214.80.330.06.50.02834.0163.03.350.619.95
224.80.340.06.50.02833.0163.03.360.619.96
234.80.650.121.10.0134.010.03.320.3613.54
244.90.2350.2711.750.0334.0118.03.070.59.46
254.90.330.311.20.01639.0150.03.330.5914.08
264.90.3350.141.30.03669.0168.03.470.4610.46666666666675
274.90.3350.141.30.03669.0168.03.470.4610.46666666666675
284.90.3450.341.00.06832.0143.03.240.410.15
294.90.3450.341.00.06832.0143.03.240.410.15
304.90.470.171.90.03560.0148.03.270.3511.56
315.00.170.561.50.02624.0115.03.480.3910.87
325.00.20.41.90.01520.098.03.370.5512.056
335.00.2350.2711.750.0334.0118.03.070.59.46
345.00.240.195.00.04317.0101.03.670.5710.05
355.00.240.212.20.03931.0100.03.690.6211.76
365.00.240.341.10.03449.0158.03.320.3213.17
375.00.2550.222.70.04346.0153.03.750.7611.36
385.00.270.324.50.03258.0178.03.450.3112.67
395.00.270.324.50.03258.0178.03.450.3112.67
405.00.270.41.20.07642.0124.03.320.4710.16
415.00.290.545.70.03554.0155.03.270.3412.98
425.00.30.333.70.0354.0173.03.360.313.07
435.00.310.06.40.04643.0166.03.30.639.96
445.00.330.161.50.04910.097.03.480.4410.76
455.00.330.161.50.04910.097.03.480.4410.76
465.00.330.161.50.04910.097.03.480.4410.76
475.00.330.184.60.03240.0124.03.180.411.06
485.00.330.2311.80.0323.0158.03.410.6411.86
495.00.350.257.80.03124.0116.03.390.411.36
505.00.350.257.80.03124.0116.03.390.411.36
515.00.440.0418.60.03938.0128.03.370.5710.26
525.00.4550.181.90.03633.0106.03.210.8314.07
535.00.550.148.30.03235.0164.03.530.5112.58
545.00.610.121.30.00965.0100.03.260.3713.55
555.10.110.321.60.02812.090.03.570.5212.26
565.10.140.250.70.03915.089.03.220.439.26
575.10.1650.225.70.04742.0146.03.180.559.96
585.10.210.281.40.04748.0148.03.50.4910.45
595.10.230.181.00.05313.099.03.220.3911.55
605.10.250.361.30.03540.078.03.230.6412.17
615.10.260.331.10.02746.0113.03.350.4311.47
625.10.260.346.40.03426.099.03.230.419.26
635.10.290.288.30.02627.0107.03.360.3711.06
645.10.290.288.30.02627.0107.03.360.3711.06
655.10.30.32.30.04840.0150.03.290.4612.26
665.10.3050.131.750.03617.073.03.40.5112.33333333333335
675.10.310.30.90.03728.0152.03.540.5610.16
685.10.330.221.60.02718.089.03.510.3812.57
695.10.330.221.60.02718.089.03.510.3812.57
705.10.330.221.60.02718.089.03.510.3812.57
715.10.330.276.70.02244.0129.03.360.3911.07
725.10.350.266.80.03436.0120.03.380.411.56
735.10.350.266.80.03436.0120.03.380.411.56
745.10.350.266.80.03436.0120.03.380.411.56
755.10.390.211.70.02715.072.03.50.4512.56
765.10.420.011.50.01725.0102.03.380.3612.37
775.10.520.062.70.05230.079.03.320.439.35
785.20.1550.331.60.02813.059.03.30.8411.98
795.20.1550.331.60.02813.059.03.30.8411.98
805.20.160.340.80.02926.077.03.250.5110.16
815.20.170.270.70.0311.068.03.30.419.85
825.20.1850.221.00.0347.0123.03.550.4410.156
835.20.20.273.20.04716.093.03.440.5310.17
845.20.210.311.70.04817.061.03.240.3712.07
855.20.220.466.20.06641.0187.03.190.429.733333333333335
865.20.240.157.10.04332.0134.03.240.489.96
875.20.240.453.80.02721.0128.03.550.4911.28
885.20.240.453.80.02721.0128.03.550.4911.28
895.20.250.231.40.04720.077.03.320.6211.45
905.20.280.291.10.02818.069.03.240.5410.06
915.20.2850.295.150.03564.0138.03.190.3412.48
925.20.30.341.50.03818.096.03.560.4813.08
935.20.310.22.40.02727.0117.03.560.4513.07
945.20.310.365.10.03146.0145.03.140.3112.47
955.20.3350.21.70.03317.074.03.340.4812.36
965.20.340.376.20.03142.0133.03.250.4112.56
975.20.360.021.60.03124.0104.03.440.3512.26
985.20.3650.0813.50.04137.0142.03.460.399.96
995.20.370.331.20.02813.081.03.370.3811.76
1005.20.380.267.70.05320.0103.03.270.4512.26
Rows: 1-100 of 4898 | Columns: 11

We're working with the scores given by wine tasters, so it's likely that two closely competing wines will have a similar score. Knowing this, a k-nearest neighbors (KNN) model would be best.

KNN is sensitive to unnormalized data so we'll have to normalize our data.

In [8]:
winequality.normalize(["free_sulfur_dioxide", 
                       "residual_sugar", 
                       "pH", 
                       "sulphates", 
                       "volatile_acidity", 
                       "fixed_acidity",
                       "citric_acid",
                       "chlorides",
                       "total_sulfur_dioxide",
                       "alcohol"],
                       method = "robust_zscore")
Out[8]:
123
fixed_acidity
Float
123
volatile_acidity
Float
123
citric_acid
Float
123
residual_sugar
Float
123
chlorides
Float
123
free_sulfur_dioxide
Float
123
total_sulfur_dioxide
Float
123
pH
Float
123
sulphates
Float
123
alcohol
Float
123
quality
Int
1-4.0469445568595710.562075632897163-3.3724537973829761.1054154113644201-0.674490759476594-0.858442784788394-0.4651660410183413.844597329016592-0.2890674683471121.348981518953196
2-3.912046404964252-0.3934529430280140.89932101263546-0.18735854429905424-1.25262569617082-0.306586708852998-0.3721328328146732.630513961958721-1.0599140506060781.618777822743838
3-3.507351949278295-1.0117361392148930.44966050631773-0.6370190506167844-1.348981518953193.6177231644653740.6279741553747613.1701065695399974.0469445568595711.079185215162557
4-3.507351949278295-0.505868069607446-1.011736139214893-0.018735854429905423-0.1927116455647411.8395202531179870.5349409471710931.618777822743828-0.289067468347112-1.618777822743833
5-3.2375556454876570.6744907594765950.786905886056028-0.16862268986914883-1.25262569617082-0.183952025311799-0.162808114356421.888574126534466-1.0599140506060781.618777822743838
6-3.2375556454876572.248302531588651-2.473132784747516-0.4496605063177302-1.83076063286504-0.183952025311799-0.5349409471710932.023472278429786-1.252625696170821.821125050586816
7-3.2375556454876573.147623544224111-2.585547911326948-0.018735854429905423-0.4817791139118531.103712151870792-0.8605571758839321.551328746796169-0.6744907594765951.214083367057877
8-3.102657493592338-0.786905886056028-1.236566392373758-0.7962738132709806-0.9635582278237063.3724537973829760.5814575512729271.079185215162552-0.481779113911854-1.618777822743835
9-2.9677593416970192.079679841719502-3.597284050541841-0.71196246833640620.963558227823706-1.410298860723791.0233652902403514.1143936328072310.770846582258966-0.134898151895325
10-2.8328611898017-1.292773955663474-0.337245379738298-0.7869058860560278-0.096355822782370.0613173417706-1.0233652902403513.9120464049642520.1927116455647410.6070416835289366
11-2.83286118980170.843113449345744-2.023472278429786-0.7306983227663115-0.6744907594765942.1461069619709850.7907822697311811.956023202482126-0.0963558227823710.06744907594765935
12-2.83286118980172.192094968298934-1.573811772112055-0.618283196186879-0.674490759476594-0.0613173417706-0.6512324574256780.2023472278429793.4688096201653472.428166734115747
13-2.83286118980174.609020189756734-2.585547911326948-0.7869058860560278-2.21618392399452-1.778202911347387-2.9072877563646340.809388911371914-1.252625696170822.15837043032515
14-2.83286118980175.901794145420208-3.597284050541841-0.33724537973829766-0.674490759476594-0.6744907594765950.02.3607176581680834.3360120252066832.293268582220426
15-2.697963037906381-1.4613966455326230.0-0.749434177196217-0.096355822782370.367904050623597-0.8372988738330151.6187778227438281.6380489873003030.9442870632672347
16-2.697963037906381-1.011736139214893-0.44966050631773-0.43092465188782475-1.25262569617082-0.735808101247195-0.5349409471710931.34898151895319-1.252625696170820.6070416835289367
17-2.697963037906381-0.562075632897163-1.2365663923737580.9367927214952713-0.578134936694223-1.042394810100193-0.5116826451201763.2375556454876570.0963558227823711.214083367057877
18-2.697963037906381-0.3934529430280140.674490759476595-0.7494341771962172.987030506253490.797125443017794-0.0930332082036680.876837987319574-0.674490759476595-0.06744907594765936
19-2.6979630379063810.0-1.0117361392148931.011736139214893-0.867202405041335-0.674490759476595-0.5349409471710931.888574126534466-1.8307606328650440.7419398354242547
20-2.6979630379063810.337245379738298-1.011736139214893-0.76817003162612240.09635582278237070.2452693670823981.0698818943421850.674490759476595-1.252625696170821.011736139214896
21-2.6979630379063810.786905886056028-3.5972840505418410.24356610758877054-1.445337341735560.00.6744907594765951.1466342911102121.34898151895319-0.3372453797382985
22-2.6979630379063810.89932101263546-3.5972840505418410.24356610758877054-1.44533734173556-0.06131734177060.6744907594765951.2140833670578711.34898151895319-0.3372453797382986
23-2.6979630379063814.384189936597869-2.248302531588651-0.7681700316261224-2.89067468347112-1.839520253117987-2.8840294543137170.944287063267233-1.0599140506060782.090921354377444
24-2.563064886011062-0.281037816448581-0.5620756328971631.2271984651588053-1.252625696170820.0-0.372132832814673-0.7419398354242550.289067468347112-0.6744907594765956
25-2.5630648860110620.786905886056028-0.112415126579433-0.749434177196217-2.601607215124010.3065867088529980.3721328328146731.0117361392148931.1562698733884492.428166734115748
26-2.5630648860110620.843113449345744-2.023472278429786-0.7306983227663115-0.6744907594765942.1461069619709850.7907822697311811.956023202482126-0.0963558227823710.04496605063179565
27-2.5630648860110620.843113449345744-2.023472278429786-0.7306983227663115-0.6744907594765942.1461069619709850.7907822697311811.956023202482126-0.0963558227823710.04496605063179565
28-2.5630648860110620.9555285759251760.224830253158865-0.78690588605602782.40889556955927-0.1226346835411990.2093247184582540.404694455685957-0.674490759476595-0.2023472278429795
29-2.5630648860110620.9555285759251760.224830253158865-0.78690588605602782.40889556955927-0.1226346835411990.2093247184582540.404694455685957-0.674490759476595-0.2023472278429795
30-2.5630648860110622.360717658168083-1.686226898691488-0.618283196186879-0.7708465822589641.5942508860355890.3256162287128390.607041683528936-1.1562698733884490.7419398354242546
31-2.428166734115743-1.0117361392148932.697963037906381-0.6932266139065008-1.6380489873003-0.613173417705996-0.4419077389674242.023472278429786-0.7708465822589660.2697963037906387
32-2.428166734115743-0.6744907594765950.89932101263546-0.618283196186879-2.69796303790638-0.858442784788394-0.8372988738330151.2815324430055310.7708465822589661.112909753136386
33-2.428166734115743-0.281037816448581-0.5620756328971631.2271984651588053-1.252625696170820.0-0.372132832814673-0.7419398354242550.289067468347112-0.6744907594765956
34-2.428166734115743-0.224830253158865-1.461396645532623-0.037471708859810850.0-1.042394810100193-0.7675239676802633.3050047214353160.963558227823707-0.2697963037906385
35-2.428166734115743-0.224830253158865-1.236566392373758-0.5620756328971628-0.385423291129482-0.183952025311799-0.7907822697311813.4399028733306351.4453373417355610.8768379873195736
36-2.428166734115743-0.2248302531588650.224830253158865-0.7681700316261224-0.8672024050413350.9197601265589930.558199249222010.944287063267233-1.4453373417355611.821125050586817
37-2.428166734115743-0.056207563289716-1.124151265794325-0.468396360747635630.00.7358081012471950.4419077389674243.8445973290165922.7943188606887510.6070416835289366
38-2.4281667341157430.1124151265794330.0-0.13115098100933797-1.059914050606081.4716162024943891.0233652902403511.821125050586807-1.5416931645179321.483879670848517
39-2.4281667341157430.1124151265794330.0-0.13115098100933797-1.059914050606081.4716162024943891.0233652902403511.821125050586807-1.5416931645179321.483879670848517
40-2.4281667341157430.1124151265794330.89932101263546-0.7494341771962173.179742151818230.490538734164796-0.2325830205091710.9442870632672330.0-0.2023472278429796
41-2.4281667341157430.3372453797382982.4731327847475160.09367927214952712-0.7708465822589641.2263468354119910.4884243430692590.607041683528936-1.252625696170821.686226898691498
42-2.4281667341157430.449660506317730.112415126579433-0.2810378164485814-1.252625696170821.2263468354119910.9070737799857661.214083367057871-1.6380489873003031.753675974639157
43-2.4281667341157430.562075632897163-3.5972840505418410.22483025315886510.2890674683471120.5518560759353960.7442656656293460.8093889113719141.541693164517932-0.3372453797382986
44-2.4281667341157430.786905886056028-1.79864202527092-0.69322661390650080.578134936694224-1.471616202494389-0.8605571758839322.023472278429786-0.2890674683471120.2023472278429786
45-2.4281667341157430.786905886056028-1.79864202527092-0.69322661390650080.578134936694224-1.471616202494389-0.8605571758839322.023472278429786-0.2890674683471120.2023472278429786
46-2.4281667341157430.786905886056028-1.79864202527092-0.69322661390650080.578134936694224-1.471616202494389-0.8605571758839322.023472278429786-0.2890674683471120.2023472278429786
47-2.4281667341157430.786905886056028-1.573811772112055-0.11241512657943255-1.059914050606080.367904050623597-0.2325830205091710.0-0.6744907594765950.4046944556859576
48-2.4281667341157430.786905886056028-1.0117361392148931.236566392373758-1.25262569617082-0.6744907594765950.558199249222011.5513287467961691.6380489873003030.9442870632672346
49-2.4281667341157431.011736139214893-0.7869058860560280.4871322151775411-1.15626987338845-0.613173417705996-0.4186494369165071.41643059490085-0.6744907594765950.6070416835289366
50-2.4281667341157431.011736139214893-0.7869058860560280.4871322151775411-1.15626987338845-0.613173417705996-0.4186494369165071.41643059490085-0.6744907594765950.6070416835289366
51-2.4281667341157432.023472278429786-3.1476235442241112.510604493607327-0.3854232911294820.245269367082398-0.1395498123055021.2815324430055310.963558227823707-0.134898151895326
52-2.4281667341157432.192094968298934-1.573811772112055-0.618283196186879-0.674490759476594-0.0613173417706-0.6512324574256780.2023472278429793.4688096201653472.428166734115747
53-2.4281667341157433.260038670803543-2.0234722784297860.5808114873270682-1.059914050606080.06131734177060.6977490615275122.3607176581680830.3854232911294831.416430594900858
54-2.4281667341157433.934529430280138-2.248302531588651-0.7306983227663115-3.27609797460061.900837594888586-0.7907822697311810.539592607581276-0.9635582278237072.090921354377445
55-2.293268582220424-1.6862268986914880.0-0.6744907594765953-1.44533734173556-1.34898151895319-1.0233652902403512.6305139619587210.4817791139118541.214083367057876
56-2.293268582220424-1.34898151895319-0.786905886056028-0.8431134493457442-0.385423291129482-1.165029493641392-1.0466235922912680.269796303790638-0.385423291129483-0.8093889113719156
57-2.293268582220424-1.067943702504609-1.1241512657943250.093679272149527120.3854232911294830.4905387341647960.2790996246110050.00.770846582258966-0.3372453797382986
58-2.293268582220424-0.562075632897163-0.44966050631773-0.71196246833640620.3854232911294830.8584427847883940.3256162287128392.1583704303251050.1927116455647410.05
59-2.293268582220424-0.337245379738298-1.573811772112055-0.78690588605602780.963558227823706-1.287664177182591-0.8140405717820980.269796303790638-0.7708465822589660.7419398354242545
60-2.293268582220424-0.1124151265794330.44966050631773-0.7306983227663115-0.7708465822589640.367904050623597-1.3024649148513560.3372453797382981.6380489873003031.146634291110217
61-2.2932685822204240.00.112415126579433-0.7681700316261224-1.541693164517930.735808101247195-0.4884243430692591.146634291110212-0.3854232911294830.6744907594765957
62-2.2932685822204240.00.2248302531588650.2248302531588651-0.867202405041335-0.490538734164796-0.8140405717820980.337245379738298-0.578134936694224-0.8093889113719156
63-2.2932685822204240.337245379738298-0.449660506317730.5808114873270682-1.6380489873003-0.429221392394197-0.6279741553747611.214083367057871-0.9635582278237070.4046944556859576
64-2.2932685822204240.337245379738298-0.449660506317730.5808114873270682-1.6380489873003-0.429221392394197-0.6279741553747611.214083367057871-0.9635582278237070.4046944556859576
65-2.2932685822204240.44966050631773-0.224830253158865-0.54333977846725730.4817791139118530.3679040506235970.3721328328146730.741939835424255-0.0963558227823711.214083367057876
66-2.2932685822204240.505868069607446-2.135887405009218-0.6463869778317372-0.674490759476594-1.042394810100193-1.4187564251059421.4838796708485090.3854232911294831.304015468321395
67-2.2932685822204240.562075632897163-0.224830253158865-0.8056417404859333-0.578134936694223-0.3679040506235970.4186494369165072.4281667341157430.867202405041337-0.2023472278429796
68-2.2932685822204240.786905886056028-1.124151265794325-0.6744907594765953-1.54169316451793-0.981077468329593-1.0466235922912682.225819506272764-0.8672024050413371.416430594900857
69-2.2932685822204240.786905886056028-1.124151265794325-0.6744907594765953-1.54169316451793-0.981077468329593-1.0466235922912682.225819506272764-0.8672024050413371.416430594900857
70-2.2932685822204240.786905886056028-1.124151265794325-0.6744907594765953-1.54169316451793-0.981077468329593-1.0466235922912682.225819506272764-0.8672024050413371.416430594900857
71-2.2932685822204240.786905886056028-0.5620756328971630.2810378164485814-2.023472278429780.613173417705996-0.1162915102545851.214083367057871-0.7708465822589660.4046944556859577
72-2.2932685822204241.011736139214893-0.6744907594765950.2997736708784868-0.8672024050413350.122634683541199-0.3256162287128391.34898151895319-0.6744907594765950.7419398354242546
73-2.2932685822204241.011736139214893-0.6744907594765950.2997736708784868-0.8672024050413350.122634683541199-0.3256162287128391.34898151895319-0.6744907594765950.7419398354242546
74-2.2932685822204241.011736139214893-0.6744907594765950.2997736708784868-0.8672024050413350.122634683541199-0.3256162287128391.34898151895319-0.6744907594765950.7419398354242546
75-2.2932685822204241.461396645532623-1.236566392373758-0.6557549050466899-1.54169316451793-1.165029493641392-1.4420147271568592.158370430325105-0.1927116455647411.416430594900856
76-2.2932685822204241.79864202527092-3.484868923962408-0.6932266139065008-2.50525139234164-0.551856075935396-0.7442656656293461.34898151895319-1.0599140506060781.281532443005537
77-2.2932685822204242.922793291065246-2.922793291065246-0.468396360747635630.867202405041336-0.245269367082398-1.2792066128004390.944287063267233-0.385423291129483-0.7419398354242545
78-2.158370430325105-1.1803588290840420.112415126579433-0.6744907594765953-1.44533734173556-1.287664177182591-1.7443726538187810.8093889113719143.5651654429477171.011736139214898
79-2.158370430325105-1.1803588290840420.112415126579433-0.6744907594765953-1.44533734173556-1.287664177182591-1.7443726538187810.8093889113719143.5651654429477171.011736139214898
80-2.158370430325105-1.1241512657943250.224830253158865-0.8243775949158387-1.34898151895319-0.490538734164796-1.3257232169022730.4721435316336170.385423291129483-0.2023472278429796
81-2.158370430325105-1.011736139214893-0.562075632897163-0.8431134493457442-1.25262569617082-1.41029886072379-1.5350479353605270.809388911371914-0.578134936694224-0.4046944556859575
82-2.158370430325105-0.843113449345744-1.124151265794325-0.7869058860560278-1.252625696170820.797125443017794-0.2558413225600882.495615810063402-0.289067468347112-0.1686226898691496
83-2.158370430325105-0.674490759476595-0.562075632897163-0.37471708859810850.385423291129483-1.103712151870792-0.95359038408761.7536759746391470.578134936694224-0.2023472278429797
84-2.158370430325105-0.562075632897163-0.112415126579433-0.65575490504668990.481779113911853-1.042394810100193-1.6978560497169460.404694455685957-0.9635582278237071.079185215162557
85-2.158370430325105-0.449660506317731.5738117721120550.187358544299054242.216183923994520.4292213923941971.2326900086986050.06744907594766-0.481779113911854-0.4496605063177325
86-2.158370430325105-0.224830253158865-1.9110571518503530.35598123416820310.0-0.1226346835411990.00.4046944556859570.096355822782371-0.3372453797382986
87-2.158370430325105-0.2248302531588651.461396645532623-0.26230196201867595-1.54169316451793-0.797125443017794-0.1395498123055022.4956158100634020.1927116455647410.5395926075812758
88-2.158370430325105-0.2248302531588651.461396645532623-0.26230196201867595-1.54169316451793-0.797125443017794-0.1395498123055022.4956158100634020.1927116455647410.5395926075812758
89-2.158370430325105-0.112415126579433-1.011736139214893-0.71196246833640620.385423291129483-0.858442784788394-1.3257232169022730.9442870632672331.4453373417355610.6744907594765955
90-2.1583704303251050.224830253158865-0.337245379738298-0.7681700316261224-1.44533734173556-0.981077468329593-1.511789633309610.4046944556859570.674490759476595-0.2697963037906386
91-2.1583704303251050.281037816448581-0.337245379738298-0.009367927214952712-0.7708465822589641.8395202531179870.0930332082036680.06744907594766-1.252625696170821.348981518953198
92-2.1583704303251050.449660506317730.224830253158865-0.6932266139065008-0.481779113911853-0.981077468329593-0.8838154779348492.5630648860110620.0963558227823711.753675974639158
93-2.1583704303251050.562075632897163-1.34898151895319-0.5246039240373519-1.54169316451793-0.429221392394197-0.395391134865592.563064886011062-0.1927116455647411.753675974639157
94-2.1583704303251050.5620756328971630.44966050631773-0.018735854429905423-1.156269873388450.7358081012471950.255841322560088-0.269796303790638-1.5416931645179321.348981518953197
95-2.1583704303251050.843113449345744-1.34898151895319-0.6557549050466899-0.963558227823706-1.042394810100193-1.3954981230550241.0791852151625520.0963558227823711.281532443005536
96-2.1583704303251050.899321012635460.5620756328971630.18735854429905424-1.156269873388450.490538734164796-0.0232583020509170.472143531633617-0.5781349366942241.416430594900856
97-2.1583704303251051.124151265794325-3.372453797382976-0.6744907594765953-1.15626987338845-0.613173417705996-0.6977490615275121.753675974639147-1.1562698733884491.214083367057876
98-2.1583704303251051.180358829084042-2.6979630379063811.5550759176821503-0.1927116455647410.1839520253117990.1860664164073371.888574126534466-0.770846582258966-0.3372453797382986
99-2.1583704303251051.2365663923737580.112415126579433-0.749434177196217-1.44533734173556-1.287664177182591-1.2326900086986051.281532443005531-0.8672024050413370.8768379873195736
100-2.1583704303251051.34898151895319-0.6744907594765950.468396360747635630.963558227823706-0.858442784788394-0.7210073635784290.607041683528936-0.1927116455647411.214083367057876
Rows: 1-100 | Columns: 11

Machine Learning

Let's create our KNN model.

In [9]:
from verticapy.learn.neighbors import KNeighborsRegressor
from verticapy.learn.model_selection import cross_validate

predictors = winequality.get_columns(exclude_columns = ["quality"])
model = KNeighborsRegressor(name = "winequality_KNN", n_neighbors = 50)
cross_validate(model, winequality, predictors, "quality")

Out[9]:
explained_variance
max_error
median_absolute_error
mean_absolute_error
mean_squared_error
root_mean_squared_error
r2
r2_adj
aic
bic
time
1-fold0.343202643342752.760.460.5627489309712890.5104530238240680.71445995256841930.3430562357723460.33901599121990045-1078.649099958337-1019.4047351457160.12743496894836426
2-fold0.3135185211591983.380.460.5688970588235290.5304612745098040.72832772466095510.3128698769691540.30863094962164717-1012.5386180336403-953.32840410583870.13380694389343262
3-fold0.345579540553783.160.460.5632293577981650.5137668501529050.716775313576650.3453709392355810.34133997211264744-1066.7239828734503-1007.49326569432860.1439499855041504
avg0.3341002350185763.10.460.5649584491976610.51822704949559230.71985433026867480.333765683992360340.32966230431806504-1052.6372336218092-993.40880164862780.13506396611531576
std0.0178638634995326620.31432467291003430.00.00341938398293184340.0107239245385350730.0074289321941281580.0181332712546298930.01825071593314536735.23459001376162535.217885747356330.00832895681398358
Rows: 1-5 | Columns: 12

Our model is pretty good. Our predicted scores have a median absolute error of less than 0.5. If we want to improve this model, we'll probably need more relevant features.

Conclusion

We've solved our problem in a Pandas-like way, all without ever loading data into memory!


VerticaPy

Python API for Vertica Data Science at Scale

About the Author


Badr Ouali
Head of Data Science


Badr Ouali works as a Lead Data Scientist for Vertica worldwide. He can embrace data projects end to end through a clear understanding of the “big picture” as well as attention to details, resulting in achieving great business outcomes – a distinctive differentiator in his role. Badr enjoys sharing knowledge and insights related to data analytics with colleagues & peers and has a sweet spot for Python. He loves helping customers finding the best value from their data and empower them to solve their use-cases.