Time Series#

Time series models are a type of regression on a dataset with a timestamp label.

The following example creates a time series model to predict the number of forest fires in Brazil with the ‘Amazon’ dataset.

[7]:
from verticapy.datasets import load_amazon
amazon = load_amazon().groupby("date", "SUM(number) AS number")
display(amazon)
📅
date
Date
123
number
Integer
11998-01-010
21998-02-010
31998-03-010
41998-04-010
51998-05-010
61998-06-013551
71998-07-018066
81998-08-0135549
91998-09-0141968
101998-10-0123495
111998-11-016804
121998-12-014448
131999-01-011081
141999-02-011284
151999-03-01667
161999-04-01717
171999-05-011812
181999-06-013632
191999-07-018756
201999-08-0139486
211999-09-0136913
221999-10-0127012
231999-11-018860
241999-12-014376
252000-01-01778
262000-02-01561
272000-03-01848
282000-04-01537
292000-05-012097
302000-06-016275
312000-07-014739
322000-08-0122202
332000-09-0123291
342000-10-0127336
352000-11-018399
362000-12-014465
372001-01-01547
382001-02-011059
392001-03-011268
402001-04-011081
412001-05-012090
422001-06-018433
432001-07-016490
442001-08-0131887
452001-09-0139834
462001-10-0131038
472001-11-0115639
482001-12-016201
492002-01-011654
502002-02-011570
512002-03-011679
522002-04-011682
532002-05-013818
542002-06-0110839
552002-07-0113751
562002-08-0157151
572002-09-0155803
582002-10-0147722
592002-11-0128179
602002-12-0111944
612003-01-015091
622003-02-012398
632003-03-012749
642003-04-012677
652003-05-011747
662003-06-016506
672003-07-0111804
682003-08-0143736
692003-09-0176325
702003-10-0143295
712003-11-0123572
722003-12-0115342
732004-01-012705
742004-02-011255
752004-03-012040
762004-04-011335
772004-05-013535
782004-06-0114262
792004-07-0123809
802004-08-0149325
812004-09-0183500
822004-10-0140331
832004-11-0130763
842004-12-0117524
852005-01-014990
862005-02-012153
872005-03-011706
882005-04-011011
892005-05-013210
902005-06-015811
912005-07-0115663
922005-08-0151981
932005-09-0176257
942005-10-0149876
952005-11-0121752
962005-12-016354
972006-01-013255
982006-02-011666
992006-03-011774
1002006-04-01792
Rows: 1-100 | Columns: 2

The feature ‘date’ tells us that we should be working with a time series model. To do predictions on time series, we use previous values called ‘lags’.

To help visualize the seasonality of forest fires, we’ll draw some autocorrelation plots.

[8]:
amazon.acf(ts = "date",
           column = "number",
           p = 48)
amazon.pacf(ts = "date",
            column = "number",
            p = 48)

[8]:
value
confidence
01.00.12677953091477837
10.6807919434785590.17635811053763537
2-0.4486517606020.19431587020757396
3-0.0568109385115240.1949967219550794
4-0.2140725725654210.19920783402950895
5-0.1322753791800030.20106670828355816
6-0.2092715153991610.20504977108289735
7-0.220860052264010.20938483891715817
8-0.1153815124226390.21088997523164357
9-0.03038976767026760.21142090542967373
100.1959400578119360.21490010161856166
110.4210968546954670.2288227396007685
120.3540856008911830.23839868796477473
13-0.2775394542721640.24434402010546988
14-0.04666198735622560.2450381591011353
15-0.02521793604808650.2456289146925101
16-0.02086273566748780.2462094909416231
17-0.07389250384646020.24714597766954416
18-0.02771521681516830.24775839671186392
19-0.05701391118809270.2485493119105585
20-0.03590624735329430.24920689325929707
210.03689265328445790.2498738171394672
220.1835170898442330.25281820622976037
230.2904065042048870.25925413775321304
240.1487488985613340.2613732869695089
25-0.2528485514180730.2663277988831778
26-0.02076911177012250.26698138981979946
270.03531408562741430.2676947498028233
28-0.09210542572965450.26890332746753587
290.01543551783576860.2695589819239979
30-0.03902146242639030.2703066482776497
31-0.03619061302176910.2710449043780445
32-0.04764840015573260.27185384167555554
33-0.01909156284863090.2725378227507442
34-0.04482758045506250.2733395375351199
350.3117500359435250.28060824273716456
360.04339320169858290.2814251891164287
37-0.1158793097989460.28302463030064584
38-0.01348822910904130.2837400527560789
390.03044522333557770.28451110093290016
40-0.08502784555008170.28571394104765774
410.005008940180052890.2864362316089425
42-0.03590161909112420.2872498182295229
43-0.05156360046989350.288162560585811
44-0.1027428425736540.2896194072202406
45-0.1474389885662550.29184355740744294
46-0.08907912533192790.2931379370389525
470.0372692910251630.2939948682267675
48-0.1088296689700810.2955705134853683
Rows: 1-49 | Columns: 3
../../../_images/notebooks_ml_time_series_index_4_3.png
../../../_images/notebooks_ml_time_series_index_4_4.png

Forest fires follow a predictable, seasonal pattern, so it should be easy to predict future forest fires with past data.

VerticaPy offers several models, including a multiple time series model. For this example, let’s use a SARIMAX model.

[10]:
from verticapy.learn.tsa import SARIMAX
model = SARIMAX("SARIMAX_amazon",
                p = 1,
                d = 0,
                q = 0,
                P = 4,
                D = 0,
                Q = 0,
                s = 12)
model.fit(amazon,
          y = "number",
          ts = "date")
[10]:
=======
details
=======

# Coefficients

     predictor            coefficient
1    Intercept       157.796898394296
2          ar1      0.227469801171249
3         ar12      0.223437485648028
4         ar24      0.332300398258616
5         ar36      0.323432558611675
6         ar48    -0.0577341008764085
Rows: 1-6 | Columns: 2

===============
Additional Info
===============
Input Relation : (SELECT "date", "number" FROM (SELECT "date", SUM(number) AS number FROM "public"."amazon" GROUP BY 1) VERTICAPY_SUBTABLE) VERTICAPY_SUBTABLE
y : "number"
ts : "date"

Just like with other regression models, we’ll evaluate our model with the report() method.

[11]:
model.report()
[11]:
value
explained_variance0.722933514390621
max_error62492.5846405041700760
median_absolute_error1926.16510474475
mean_absolute_error6244.63330879244
mean_squared_error124238623.160803
root_mean_squared_error11146.238072139093
r20.722927644647118
r2_adj0.7149197731051271
aic3348.6392952754804
bic3367.2752380175016
Rows: 1-10 | Columns: 2

We can also draw our model using one-step ahead and dynamic forecasting.

[12]:
model.plot(amazon,
           nlead = 150,
           dynamic = True)
[12]:
<AxesSubplot:title={'center':'SARIMAX(1,0,0)(4,0,0)_12'}, xlabel='"date"'>
../../../_images/notebooks_ml_time_series_index_10_1.png

This concludes the fundamental lessons on machine learning algorithms in VerticaPy.