VerticaPy

Python API for Vertica Data Science at Scale

Time Series

Time series models are a type of regression on a dataset with a timestamp label.

The following example creates a time series model to predict the number of forest fires in Brazil with the 'Amazon' dataset.

In [43]:
from verticapy import *
amazon = vDataFrame("amazon_clean")
display(amazon)
📅
date
Date
123
number
Int
11998-01-010
21998-02-010
31998-03-010
41998-04-010
51998-05-010
61998-06-013551
71998-07-018066
81998-08-0135549
91998-09-0141968
101998-10-0123495
111998-11-016804
121998-12-014448
131999-01-011081
141999-02-011284
151999-03-01667
161999-04-01717
171999-05-011812
181999-06-013632
191999-07-018756
201999-08-0139486
211999-09-0136913
221999-10-0127012
231999-11-018860
241999-12-014376
252000-01-01778
262000-02-01561
272000-03-01848
282000-04-01537
292000-05-012097
302000-06-016275
312000-07-014739
322000-08-0122202
332000-09-0123291
342000-10-0127336
352000-11-018399
362000-12-014465
372001-01-01547
382001-02-011059
392001-03-011268
402001-04-011081
412001-05-012090
422001-06-018433
432001-07-016490
442001-08-0131887
452001-09-0139834
462001-10-0131038
472001-11-0115639
482001-12-016201
492002-01-011654
502002-02-011570
512002-03-011679
522002-04-011682
532002-05-013818
542002-06-0110839
552002-07-0113751
562002-08-0157151
572002-09-0155803
582002-10-0147722
592002-11-0128179
602002-12-0111944
612003-01-015091
622003-02-012398
632003-03-012749
642003-04-012677
652003-05-011747
662003-06-016506
672003-07-0111804
682003-08-0143736
692003-09-0176325
702003-10-0143295
712003-11-0123572
722003-12-0115342
732004-01-012705
742004-02-011255
752004-03-012040
762004-04-011335
772004-05-013535
782004-06-0114262
792004-07-0123809
802004-08-0149325
812004-09-0183500
822004-10-0140331
832004-11-0130763
842004-12-0117524
852005-01-014990
862005-02-012153
872005-03-011706
882005-04-011011
892005-05-013210
902005-06-015811
912005-07-0115663
922005-08-0151981
932005-09-0176257
942005-10-0149876
952005-11-0121752
962005-12-016354
972006-01-013255
982006-02-011666
992006-03-011774
1002006-04-01792
Rows: 1-100 | Columns: 2

The feature 'date' tells us that we should be working with a time series model. To do predictions on time series, we use previous values called 'lags'.

To help visualize the seasonality of forest fires, we'll draw some autocorrelation plots.

In [44]:
amazon.acf(ts = "date", 
           column = "number",
           p = 48)
amazon.pacf(ts = "date", 
            column = "number",
            p = 48)
Out[44]:
value
confidence
01.00.12677953091477834
10.6807919434785590.17635811053763534
2-0.4486517606020.19431587020757393
3-0.0568109385115240.19499672195507936
4-0.2140725725654210.19920783402950884
5-0.1322753791800030.2010667082835581
6-0.2092715153991610.2050497710828973
7-0.220860052264010.20938483891715812
8-0.1153815124226390.21088997523164352
9-0.03038976767026760.21142090542967368
100.1959400578119360.21490010161856163
110.4210968546954670.22882273960076843
120.3540856008911830.23839868796477467
13-0.2775394542721640.24434402010546982
14-0.04666198735622560.24503815910113524
15-0.02521793604808650.24562891469251005
16-0.02086273566748780.24620949094162303
17-0.07389250384646020.2471459776695441
18-0.02771521681516830.24775839671186387
19-0.05701391118809270.24854931191055846
20-0.03590624735329430.249206893259297
210.03689265328445790.2498738171394671
220.1835170898442330.2528182062297603
230.2904065042048870.259254137753213
240.1487488985613340.26137328696950884
25-0.2528485514180730.26632779888317776
26-0.02076911177012250.2669813898197994
270.03531408562741430.26769474980282326
28-0.09210542572965450.2689033274675358
290.01543551783576860.2695589819239978
30-0.03902146242639030.27030664827764966
31-0.03619061302176910.27104490437804446
32-0.04764840015573260.27185384167555543
33-0.01909156284863090.2725378227507441
34-0.04482758045506250.27333953753511986
350.3117500359435250.2806082427371645
360.04339320169858290.28142518911642866
37-0.1158793097989460.2830246303006458
38-0.01348822910904130.28374005275607883
390.03044522333557770.2845111009329001
40-0.08502784555008170.2857139410476577
410.005008940180052890.28643623160894244
42-0.03590161909112420.28724981822952284
43-0.05156360046989350.28816256058581097
44-0.1027428425736540.28961940722024054
45-0.1474389885662550.2918435574074429
46-0.08907912533192790.29313793703895247
470.0372692910251630.2939948682267674
48-0.1088296689700810.2955705134853682
Rows: 1-49 | Columns: 3

Forest fires follow a predictable, seasonal pattern, so it should be easy to predict future forest fires with past data.

VerticaPy offers several models, including a multiple time series model. For this example, let's use a SARIMAX model.

In [46]:
from verticapy.learn.tsa import SARIMAX

model = SARIMAX("SARIMAX_amazon",
                p = 1,
                d = 0,
                q = 0,
                P = 4,
                D = 0,
                Q = 0,
                s = 12)
model.fit(input_relation = "amazon_clean", 
          y = "number", 
          ts = "date")
Out[46]:
=======
details
=======

# Coefficients

     predictor            coefficient  
1    Intercept       157.796898394296  
2          ar1      0.227469801171249  
3         ar12      0.223437485648028  
4         ar24      0.332300398258616  
5         ar36      0.323432558611675  
6         ar48    -0.0577341008764085  
Rows: 1-6 | Columns: 2

===============
Additional Info
===============
Input Relation : amazon_clean
y : "number"
ts : "date"

Just like with other regression models, we'll evaluate our model with the regression_report() method.

In [47]:
model.regression_report()
Out[47]:
value
explained_variance0.687658531277065
max_error62492.5846405041700760
median_absolute_error1926.16510474475
mean_absolute_error6244.63330879244
mean_squared_error124238623.160803
root_mean_squared_error11146.238072139093
r20.722927644647118
r2_adj0.7169818859485584
Rows: 1-8 | Columns: 2

We can also draw our model using one-step ahead and dynamic forecasting.

In [48]:
model.plot(amazon,
           nlead = 150,
           dynamic = True)
Out[48]:
<AxesSubplot:title={'center':'SARIMAX(1,0,0)(4,0,0)_12'}, xlabel='"date"'>

This concludes the fundamental lessons on machine learning algorithms in VerticaPy.