Commodities

This example uses the 'Commodities' dataset to predict the price of different commodities. You can download the Jupyter Notebook of the study here.

  • date: Date of the record
  • Gold: Price per ounce of Gold
  • Oil: Price per Barrel - West Texas Intermediate (WTI)
  • Spread: Interest Rate Spreads.
  • Vix: The CBOE Volatility Index (VIX) is a measure of expected price fluctuations in the SP500 Index options over the next 30 days.
  • Dol_Eur: How much $1 US is in euros.
  • SP500: The S&P 500, or simply the S&P, is a stock market index that measures the stock performance of 500 large companies listed on stock exchanges in the United States.

We will follow the data science cycle (Data Exploration - Data Preparation - Data Modeling - Model Evaluation - Model Deployment) to solve this problem.

Initialization

This example uses the following version of VerticaPy:

In [1]:
import verticapy as vp
vp.__version__
Out[1]:
'0.9.0'

Connect to Vertica. This example uses an existing connection called "VerticaDSN." For details on how to create a connection, use see the connection tutorial.

In [2]:
vp.connect("VerticaDSN")

Let's create a Virtual DataFrame of the dataset.

In [3]:
from verticapy.datasets import load_commodities
commodities = load_commodities()
display(commodities)
📅
date
Date
123
Gold
Float
123
Oil
Float
123
Spread
Float
123
Vix
Float
123
Dol_Eur
Float
123
SP500
Float
11986-01-01345.56136363636422.92545454545451.0514285714285718.12136363636361.12159999999858211.779999
21986-02-01339.052515.45473684210530.73684210526315820.62421052631581.07880000000296226.919998
31986-03-01346.09473684210512.61250.56423.5641.04850000000442238.899994
41986-04-01340.71590909090912.84363636363640.60409090909090923.01545454545451.05259999999544235.520004
51986-05-01342.32515.3776190476190.64238095238095218.88751.03720000000612247.350006
61986-06-01342.79761904761913.42571428571430.61476190476190518.59809523809521.0399999999936250.839996
71986-07-01348.55434782608711.58454545454550.63681818181818219.63909090909091.01029999999446236.119995
81986-08-01376.2915.09666666666670.8395238095238118.63809523809520.979300000000876252.929993
91986-09-01418.15227272727314.86666666666671.1014285714285722.70523809523810.973200000000361231.320007
101986-10-01423.86304347826114.89681818181821.1472727272727322.52391304347830.961600000000544243.979996
111986-11-01396.982515.22157894736840.97388888888888918.63157894736840.967399999999543249.220001
121986-12-01391.59523809523816.1076190476190.84090909090909119.75863636363640.957399999999325242.169998
131987-01-01408.5238095238118.65142857142860.858520.76666666666670.900400000000445274.079987
141987-02-01401.04517.74894736842110.85263157894736823.44631578947370.882600000000821284.200012
151987-03-01408.84772727272718.30285714285710.82409090909090921.83727272727270.883599999999206291.700012
161987-04-01439.66518.67714285714290.99857142857142926.88142857142860.872100000000501288.359985
171987-05-01461.6519.43750.851525.41150.861100000000079290.100006
181987-06-01449.27727272727320.07318181818180.82909090909090921.62454545454550.876500000000306304.0
191987-07-01450.33043478260921.34217391304351.0045454545454517.80090909090910.888600000000224318.660004
201987-08-01460.987520.31095238095241.0076190476190520.84904761904760.89600000000064329.799988
211987-09-01460.12045454545519.531.0780952380952422.89380952380950.873499999999694321.829987
221987-10-01465.76363636363619.85909090909091.1223809523809558.21954545454550.868599999999788251.789993
231987-11-01468.14047619047618.8541.1678947368421149.43650.814700000000812230.300003
241987-12-01487.07857142857117.27454545454551.1341.76409090909090.791400000000067247.080002
251988-01-01477.757517.12951.0357894736842138.33650.800100000000384257.070007
261988-02-01442.1238095238116.79571428571431.031533.6740.822000000000116267.820007
271988-03-01443.49130434782616.19739130434781.1030434782608729.35695652173910.810900000000402258.890015
281988-04-01451.55789473684217.86251.12927.4050.805700000000798261.329987
291988-05-01451.3217.42363636363641.0933333333333325.71666666666670.813099999999395262.160004
301988-06-01451.65681818181816.52772727272730.89545454545454625.27090909090910.842500000000655273.5
311988-07-01437.45238095238115.49750.77723.6440.886300000000119272.019989
321988-08-01431.06363636363615.52347826086960.63173913043478323.70739130434780.905899999999747261.519989
331988-09-01413.43863636363614.53545454545450.51380952380952419.52428571428570.900299999999334271.910004
341988-10-01406.39047619047613.77047619047620.441520.48190476190480.87960000000021278.970001
351988-11-01419.96590909090914.14136363636360.296521.65904761904760.844499999999243273.700012
361988-12-01419.247516.3823809523810.017142857142857117.75428571428570.843999999999141277.720001
371989-01-01404.44523809523818.0242857142857-0.08417.75809523809520.880499999999302297.470001
381989-02-01387.972517.9365-0.20263157894736818.31263157894740.888899999999921288.859985
391989-03-01390.2738095238119.4840909090909-0.32181818181818217.47545454545450.896699999999328294.869995
401989-04-01384.7221.069-0.27616.88750.899199999999837309.640015
411989-05-01371.3520.1234782608696-0.15863636363636417.4350.9375320.519989
421989-06-01367.72727272727320.0504545454545-0.12909090909090916.73227272727270.955400000000736317.980011
431989-07-01375.20952380952419.78142857142860.19918.05650.914000000000669346.079987
441989-08-01365.54772727272718.5778260869565-0.032173913043478319.32347826086960.927799999999479351.450012
451989-09-01361.79761904761919.5914285714286-0.09416.74650.941100000000006349.149994
461989-10-01366.820.09772727272730.028571428571428621.95090909090910.907199999999648340.359985
471989-11-01394.36136363636419.85590909090910.078571428571428620.9323809523810.893899999999121345.98999
481989-12-01409.65526315789521.10.05918.2350.856499999999869353.399994
491990-01-01410.11818181818222.86318181818180.12142857142857123.49909090909090.831899999999223329.079987
501990-02-01416.542522.1130.10263157894736823.32684210526320.820799999999508331.890015
511990-03-01393.66136363636420.3877272727273-0.038181818181818218.83136363636360.835100000000239339.940002
521990-04-01374.92894736842118.42550.061520.45750.82510000000002330.799988
531990-05-01368.85476190476218.19954545454550.11590909090909118.15181818181820.811400000000504361.230011
541990-06-01352.65714285714316.69523809523810.12904761904761917.58285714285710.817699999999604358.019989
551990-07-01361.82045454545518.45409090909090.31428571428571418.93285714285710.792699999999968356.149994
561990-08-01394.86136363636427.30739130434780.69130434782608727.77521739130430.759899999999106322.559998
571990-09-01389.5633.50750.81368421052631628.81894736842110.761699999999109306.049988
581990-10-01381.33260869565236.03956521739130.84181818181818230.13043478260870.739799999999377304.0
591990-11-01381.86590909090932.33227272727270.793525.10666666666670.72400000000016322.220001
601990-12-01378.16052631578927.2810.76122.58450.731999999999971330.220001
611991-01-01384.59090909090925.23409090909090.96761904761904826.93181818181820.736699999999473343.929993
621991-02-01363.747520.47750.98789473684210522.11263157894740.722599999999147367.070007
631991-03-01363.3919.90151.00719.04105263157890.781800000000658375.220001
641991-04-01358.05476190476220.831.0909090909090918.92727272727270.826699999999619375.339996
651991-05-01357.11666666666721.23227272727271.2840909090909117.18045454545450.83389999999963389.829987
661991-06-01366.3620.1891.32817.5750.868700000000899371.160004
671991-07-01368.01304347826121.40304347826091.35517.66954545454550.870699999999488387.809998
681991-08-01356.72142857142921.69363636363641.4668181818181815.93272727272730.850300000000061395.429993
691991-09-01348.45952380952421.88651.468517.0290.828100000000632387.859985
701991-10-01358.82608695652223.23086956521741.61516.84636363636360.82550000000083392.450012
711991-11-01359.95952380952422.46095238095241.8584210526315817.68150.795000000000073375.220001
721991-12-01361.87519.49809523809522.0628571428571418.01857142857140.768099999999322417.089996
731992-01-01354.43636363636418.78545454545452.0742857142857117.49954545454550.770599999999831408.779999
741992-02-01353.852519.01252.1257894736842117.05052631578950.791999999999462412.700012
751992-03-01344.64090909090918.92181818181821.8572727272727316.22272727272730.812700000000404403.690002
761992-04-01338.727520.232.1361904761904816.18857142857140.804899999999179414.950012
771992-05-01337.03947368421120.97552.166514.7280.78859999999986415.350006
781992-06-01340.78409090909122.38454545454552.2136363636363614.75318181818180.767900000000736408.140015
791992-07-01352.45217391304321.77521739130432.4890909090909113.30363636363640.729799999999159424.209991
801992-08-01343.602521.33904761904762.3928571428571414.4223809523810.71349999999984414.029999
811992-09-01345.321.88136363636362.5209523809523814.25857142857140.721999999999753417.799988
821992-10-01344.27727272727321.68590909090912.5061904761904817.47863636363640.755199999999604418.679993
831992-11-01334.92380952380920.33852.2936842105263214.5940.807199999999284431.350006
841992-12-01334.65714285714319.41409090909092.0963636363636412.78136363636360.807199999999284435.709991
851993-01-01328.992519.0322.2112.5840.825000000000728438.779999
861993-02-01329.3120.08611111111112.1605263157894713.62473684210530.845300000000861443.380005
871993-03-01329.97391304347820.32217391304352.0256521739130413.69304347826090.848400000000766451.670013
881993-04-01341.947520.25252.1319047619047613.45666666666670.820100000000821440.190002
891993-05-01367.04473684210519.94952.058513.48550.821200000000317450.190002
901993-06-01371.91363636363619.09409090909091.7990909090909112.97318181818180.844300000000658450.529999
911993-07-01392.03409090909117.891.7333333333333311.86950.878399999999601448.130005
921993-08-01379.79523809523818.01090909090911.6763636363636411.95318181818180.882400000000416463.559998
931993-09-01355.56136363636417.50428571428571.5128571428571412.6490.848200000000361458.929993
941993-10-01364.00476190476218.15333333333331.459511.37523809523810.859300000000076467.829987
951993-11-01373.93863636363616.60904761904761.568513.37095238095240.885800000000018461.790009
961993-12-01383.24285714285714.51476190476191.5618181818181810.86454545454550.885800000000018466.450012
971994-01-01387.1115.02666666666671.610510.60523809523810.897499999999127481.609985
981994-02-01381.657514.78105263157891.4989473684210512.88526315789470.894800000000032467.140015
991994-03-01384.014.68086956521741.4873913043478314.24086956521740.876000000000204445.769989
1001994-04-01377.90789473684216.421.4226315789473715.33473684210530.877300000000105450.910004
Rows: 1-100 | Columns: 7

Data Exploration and Preparation

Let's explore the data by displaying descriptive statistics of all the columns.

In [6]:
commodities.describe(method = "all", unique = True)
Out[6]:
123
"Gold"
Float
100%
123
"Oil"
Float
100%
123
"Spread"
Float
100%
123
"Vix"
Float
100%
123
"Dol_Eur"
Float
100%
123
"SP500"
Float
100%
📅
"date"
Date
100%
dtypefloatfloatfloatfloatfloatfloatdate
percent100100100100100100100
count416416416416416416416
top1715.6973684210594.75666666666670.073333333333333317.1560.807199999999284375.2200011986-01-01
top_percent0.240.240.4810.240.4810.4810.24
avg720.56980975381544.05792524586241.0882753197875120.14017195484480.8496165456979391190.84896648798[null]
stddev470.810563184228.92818425601060.858643636323358.612423150193050.107742852952828752.215482101544[null]
min256.19772727272711.3472727272727-0.4131578947368428.020.634608695652174211.7799991986-01-01
approx_25%356.89983766233819.7715259740260.30414.00761904761910.770517047499652467.657494[null]
approx_50%421.55601190476231.41886128364391.0031818181818118.066250.8363299503251191132.5[null]
approx_75%1206.6068609022662.69510351966871.8063636363636423.92046428571420.9004831521742471454.767487[null]
max1971.17133.882.8342105263157965.44652173913041.1716423443500.3100592020-08-01
range1714.97227272727122.5327272727273.2473684210526357.42652173913040.5370336483478263288.5300612631
empty[null][null][null][null][null][null][null]
unique413.0416.0416.0416.0413.0416.0414.0
Rows: 1-15 | Columns: 8

We have data from January 1986 to the beginning of August 2020. We don't have any missing values, so our data is already clean.

Let's draw the different variables.

In [8]:
%matplotlib inline
commodities.plot(ts = "date")
Out[8]:
<AxesSubplot:xlabel='"date"'>

Some of the commodities have an upward monotonic trend and some others might be stationary. Let's use Augmented Dickey-Fuller tests to check our hypotheses.

In [15]:
from verticapy.stats import adfuller
from verticapy import *

fuller = {}
for commodity in ["Gold", "Oil", "Spread", "Vix", "Dol_Eur", "SP500"]:
    result = adfuller(commodities,
                      column = commodity,
                      ts = "date",
                      p = 3,
                      with_trend = True)
    fuller["index"] = result["index"]
    fuller[commodity] = result["value"]
fuller = tablesample(fuller)
display(fuller)
Gold
Oil
Spread
Vix
Dol_Eur
SP500
ADF Test Statistic-0.8615535609119586-2.8096937546469283-2.4384304559428434-4.930594978481499-2.7980161815599516-0.09469625542100038
p_value0.3894417818987940.005198660953067320.01517903617794631.19713909737974e-060.005386622700619620.924602810088177
# Lags used333333
# Observations Used416416416416416416
Critical Value (1%)3.983.983.983.983.983.98
Critical Value (2.5%)-3.68-3.68-3.68-3.68-3.68-3.68
Critical Value (5%)-3.42-3.42-3.42-3.42-3.42-3.42
Critical Value (10%)-3.13-3.13-3.13-3.13-3.13-3.13
Stationarity (alpha = 1%)
Rows: 1-9 | Columns: 7

As expected: The price of gold and the S&P 500 index are not stationary. Let's use the Mann-Kendall test to confirm the trends.

In [26]:
from verticapy.stats import mkt

kendall = {}
for commodity in ["Gold", "SP500"]:
    result = mkt(commodities,
                 column = commodity,
                 ts = "date")
    kendall["index"] = result["index"]
    kendall[commodity] = result["value"]
kendall = tablesample(kendall)
display(kendall)
Gold
SP500
Mann Kendall Test Statistic15.00107543372596424.886617341458717
S42504.070513.0
STDS2833.330196076692833.33001960591
p_value7.223928538831201e-511.0387121156452156e-136
Monotonic Trend
Trendincreasingincreasing
Rows: 1-6 | Columns: 3

Our hypothesis is correct. We can also look at the correlation between the elapsed time and our variables to see the different trends.

In [27]:
import verticapy.stats as st
commodities["elapsed_days"] = commodities["date"] - st.min(commodities["date"])._over()
commodities.corr(focus = "elapsed_days")
Out[27]:
"elapsed_days"
"elapsed_days"1.0
"SP500"0.912
"Gold"0.83
"Oil"0.712
"Spread"0.178
"Dol_Eur"-0.177
"Vix"-0.123
Rows: 1-7 | Columns: 2

In the last plot, it's a bit hard to tell if 'Spread' is stationary. Let's draw it alone.

In [28]:
commodities["Spread"].plot(ts = "date")
Out[28]:
<AxesSubplot:xlabel='"date"', ylabel='"Spread"'>

We can see some sudden changes, so let's smooth the curve.

In [29]:
commodities.rolling(func = "avg",
                    window = (-20, 0),
                    columns = "Spread",
                    order_by = ["date"],
                    name = "Spread_smooth",)
commodities["Spread_smooth"].plot(ts = "date")
Out[29]:
<AxesSubplot:xlabel='"date"', ylabel='"Spread_smooth"'>

After each local minimum, there is a local maximum. Let's look at the number of lags needed to keep most of the information. To visualize this, we can draw the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.

In [31]:
from verticapy.learn.model_selection import plot_acf_pacf
plot_acf_pacf(commodities,
              column = "Spread",
              ts = "date",
              p = 60)

Out[31]:
acf
pacf
confidence
01.01.00.09609514041450774
10.9870.9866658694826180.16516404714137115
20.966-0.3015619683842320.1703898204716388
30.9450.08441665941415430.17098407552667444
40.92-0.2018850341788340.1733971035027304
50.8950.0618547398370990.17381378139741463
60.868-0.08967850606301460.1744580710199262
70.838-0.1033056881493180.17524412731694863
80.805-0.08770120262951320.17587100727343766
90.767-0.1466824960287170.17723645098664653
100.726-0.03981068496333630.17753907309122982
110.6850.01309437212902340.17776727092127834
120.6450.01276751653437690.17799585214443142
130.605-0.007121708369808660.17821926658370002
140.5660.04750163377677780.17856158886339543
150.526-0.07092439283069350.17905342712099498
160.482-0.1422383071905170.18035763635727056
170.4380.04325174960253440.18068321561332523
180.3950.006923743474481460.1809126197556603
190.352-0.03633108158643150.18121082181014225
200.31-0.01065326631906620.1814455468371735
210.269-0.05957003637862280.18186493917383315
220.227-0.02239653975939720.18212244130940808
230.1860.02052030941497910.18237657192122106
240.146-0.01628192771290790.18262327292378183
250.105-0.02407676934187560.18288780094153362
260.066-0.03161821935761950.18317588764089568
270.028-0.021429844756720.18343590572251958
28-0.01-0.03733330066427630.18374725482094137
29-0.0440.08016030316510370.1843308503381982
30-0.077-0.03535841862062950.18463686572755553
31-0.1080.03297793478848270.18493518398481934
32-0.1370.01920586849527970.18519575467060828
33-0.164-0.03348159254291220.18549799053069815
34-0.191-0.07071349200834930.18601115955984251
35-0.2160.06356132698110520.18647368054930963
36-0.238-0.01111129090417130.1867255635874214
37-0.2580.03317512721828870.18703139473757538
38-0.2750.01911881410261620.18729846157410004
39-0.29-0.0219081460183270.18757277864287758
40-0.304-0.04926014369203110.1879539922316871
41-0.317-0.01688105131620930.18821994086248556
42-0.3270.07590123999707190.1887851045856074
43-0.335-0.01638945545422880.18905263191850366
44-0.340.01217549676094870.18931465071332243
45-0.3410.06124883119137580.18977441282629565
46-0.3390.02725269948816290.19007126527783125
47-0.3350.05667310123830670.19050423806891068
48-0.329-0.01445918216486240.1907743398221931
49-0.3230.003781810935513560.19103485719949814
50-0.316-0.05458359931032310.19145905457055504
51-0.307-0.008879108121582040.19172547565390927
52-0.297-0.02747024950333020.19203013025700777
53-0.286-0.03306244609275610.19235460115993974
54-0.2730.02859021625396110.19266512778188527
55-0.259-0.05413559708953440.193093365609904
56-0.2450.001957743684680570.19336157637058388
57-0.231-0.0204443090228650.19365379142665426
58-0.217-0.04319459151710190.1940272796833102
59-0.2040.02080325539518570.19432280243198308
60-0.19-0.01036300136476030.19460149126928603
Rows: 1-61 | Columns: 4

We can clearly see the influence of the last two values on 'Spread,' which makes sense. When the curve slightly changes its direction, it will increase/decrease until reaching a new local maximum/minimum. Only the recent values can help the prediction in case of autoregressive periodical model. The local minimums of interest rate spreads are indicators of an economic crisis.

We saw the correlation between the price-per-barrel of Oil and the time. Let's look at the time series plot of this variable.

In [32]:
commodities["Oil"].plot(ts = "date")
Out[32]:
<AxesSubplot:xlabel='"date"', ylabel='"Oil"'>

Our chart shows effects of the 2010s oil glut and its lowest price in two decades with COVID-19.

In [33]:
plot_acf_pacf(commodities,
              column = "Oil",
              ts = "date",
              p = 60)

Out[33]: