vDataFrame.acf

In [ ]:
vDataFrame.acf(column: str,
               ts: str,
               by: list = [],
               p=12,
               unit: str = "rows",
               method: str = "pearson",
               acf_type: str = "bar",
               confidence: bool = True,
               alpha: float = 0.95,
               round_nb: int = 3,
               show: bool = True,
               ax = None,
               **style_kwds,)

Computes the correlations of the input vcolumn and its lags.

Parameters

Name Type Optional Description
ts
str
TS (Time Series) vcolumn to use to order the data. It can be of type date or a numerical vcolumn.
column
str
Input vcolumn to use to compute the Auto Correlation Plot.
by
list
vcolumns used in the partition.
p
int / list
If int: the maximum number of lag to consider during the computation. If List: a list of the different lags to include in the computation. p must be positive or a list of positive integers.
unit
str
Unit to use to compute the lags.
  • rows : Natural lags.
  • else : Any time unit. For example: 'hour' computes the hour lags and 'day' to computes the day lags.
method
str
Method to use to compute the correlation.
  • pearson : Pearson correlation coefficient (linear).
  • spearmann : Spearmann's rank correlation coefficient (monotonic - rank based).
  • kendall : Kendall rank correlation coefficient (similar trends). The method will compute the Tau-B coefficient.
    ⚠ Warning: This method is computationally expensive; it uses a CROSS JOIN during computation, the complexity of which is O(n * n), where n is the total count of the vDataFrame.
  • cramer : Cramer's V (correlation between categories).
  • biserial : Point-biserial (correlation between binaries and a numericals).
acf_type
str
ACF Type.
  • bar : Classical Autocorrelation Plot using bars.
  • heatmap : Draws the ACF heatmap.
  • line : Draws the ACF using a Line Plot.
confidence
bool
If set to True, the confidence band width is drawn.
alpha
float
Significance Level. Probability to accept H0. Only used to compute the confidence band width.
round_nb
int
Round the coefficient using the input number of digits. It is used only if acf_type is 'heatmap'.
show
bool
If set to True, the Auto Correlation Plot will be drawn using Matplotlib.
ax
Matplotlib axes object
The axes to plot on.
**style_kwds
any
Any optional parameter to pass to the Matplotlib functions.

Returns

tablesample : An object containing the result. For more information, see utilities.tablesample.

Example

In [61]:
from verticapy.datasets import load_amazon
amazon = load_amazon()
display(amazon)
📅
date
Date
Abc
state
Varchar(32)
123
number
Int
11998-01-01ACRE0
21998-01-01ALAGOAS0
31998-01-01AMAPÁ0
41998-01-01AMAZONAS0
51998-01-01BAHIA0
61998-01-01CEARÁ0
71998-01-01DISTRITO FEDERAL0
81998-01-01ESPÍRITO SANTO0
91998-01-01GOIÁS0
101998-01-01MARANHÃO0
111998-01-01MATO GROSSO0
121998-01-01MATO GROSSO DO SUL0
131998-01-01MINAS GERAIS0
141998-01-01PARANÁ0
151998-01-01PARAÍBA0
161998-01-01PARÁ0
171998-01-01PERNAMBUCO0
181998-01-01PIAUÍ0
191998-01-01RIO DE JANEIRO0
201998-01-01RIO GRANDE DO NORTE0
211998-01-01RIO GRANDE DO SUL0
221998-01-01RONDÔNIA0
231998-01-01RORAIMA0
241998-01-01SANTA CATARINA0
251998-01-01SERGIPE0
261998-01-01SÃO PAULO0
271998-01-01TOCANTINS0
281998-02-01ACRE0
291998-02-01ALAGOAS0
301998-02-01AMAPÁ0
311998-02-01AMAZONAS0
321998-02-01BAHIA0
331998-02-01CEARÁ0
341998-02-01DISTRITO FEDERAL0
351998-02-01ESPÍRITO SANTO0
361998-02-01GOIÁS0
371998-02-01MARANHÃO0
381998-02-01MATO GROSSO0
391998-02-01MATO GROSSO DO SUL0
401998-02-01MINAS GERAIS0
411998-02-01PARANÁ0
421998-02-01PARAÍBA0
431998-02-01PARÁ0
441998-02-01PERNAMBUCO0
451998-02-01PIAUÍ0
461998-02-01RIO DE JANEIRO0
471998-02-01RIO GRANDE DO NORTE0
481998-02-01RIO GRANDE DO SUL0
491998-02-01RONDÔNIA0
501998-02-01RORAIMA0
511998-02-01SANTA CATARINA0
521998-02-01SERGIPE0
531998-02-01SÃO PAULO0
541998-02-01TOCANTINS0
551998-03-01ACRE0
561998-03-01ALAGOAS0
571998-03-01AMAPÁ0
581998-03-01AMAZONAS0
591998-03-01BAHIA0
601998-03-01CEARÁ0
611998-03-01DISTRITO FEDERAL0
621998-03-01ESPÍRITO SANTO0
631998-03-01GOIÁS0
641998-03-01MARANHÃO0
651998-03-01MATO GROSSO0
661998-03-01MATO GROSSO DO SUL0
671998-03-01MINAS GERAIS0
681998-03-01PARANÁ0
691998-03-01PARAÍBA0
701998-03-01PARÁ0
711998-03-01PERNAMBUCO0
721998-03-01PIAUÍ0
731998-03-01RIO DE JANEIRO0
741998-03-01RIO GRANDE DO NORTE0
751998-03-01RIO GRANDE DO SUL0
761998-03-01RONDÔNIA0
771998-03-01RORAIMA0
781998-03-01SANTA CATARINA0
791998-03-01SERGIPE0
801998-03-01SÃO PAULO0
811998-03-01TOCANTINS0
821998-04-01ACRE0
831998-04-01ALAGOAS0
841998-04-01AMAPÁ0
851998-04-01AMAZONAS0
861998-04-01BAHIA0
871998-04-01CEARÁ0
881998-04-01DISTRITO FEDERAL0
891998-04-01ESPÍRITO SANTO0
901998-04-01GOIÁS0
911998-04-01MARANHÃO0
921998-04-01MATO GROSSO0
931998-04-01MATO GROSSO DO SUL0
941998-04-01MINAS GERAIS0
951998-04-01PARANÁ0
961998-04-01PARAÍBA0
971998-04-01PARÁ0
981998-04-01PERNAMBUCO0
991998-04-01PIAUÍ0
1001998-04-01RIO DE JANEIRO0
Rows: 1-100 | Columns: 3
In [62]:
# Autocorrelation Plot for each 'month' lag using Spearman coefficients
# p = 48: it will compute 48 'months' lags
amazon.acf(ts = "date", 
           column = "number", 
           p = 48,
           by = ["state"],
           unit = "month",
           method = "spearman")
Out[62]:
value
confidence
010.024396841824873748
10.865436803960170.03856205308291557
20.6419226395188780.04656468707262692
30.399389293501420.052519441106804815
40.1856557998465230.057377036550587374
50.04593831992307460.061640765754674524
6-0.00936843359205180.0655069183976806
70.01961680762547270.06894501806026314
80.1274682155781890.07161118776156336
90.2968285113930740.07416885585834694
100.4946634714209210.07659027318891991
110.6691836237218030.07887501028576362
120.7562407777512270.08070972531360529
130.7036564223473670.08246563028583806
140.5450777941470880.08399101248988675
150.3430925972778070.08547965236320183
160.1549755025880190.0868462967771087
170.02299789765094970.08794223779603168
18-0.04000857870921030.08899807709603456
19-0.0236632170873690.08984615265637208
200.06719192786413140.09063193558306873
210.2150560317811250.09126725326056165
220.3943265409909520.0918613330687679
230.5601743884140020.0924395780522622
240.652326944993960.09293318667622075
250.622365647619640.09323727861686824
260.4892417595855530.09351085391986542
270.309344145287620.09373816688889977
280.1308898017281710.09389849041590317
29-0.001318022324472980.09401990717406696
30-0.0699827488438810.09413611707174371
31-0.07002694235129020.09424661024229615
320.004700656819557680.09433115129623874
330.1339146158898730.09440975398732615
340.299992244046420.0944787937500136
350.4603782179492140.09451720691829557
360.5586703377890430.0945555772769078
370.5526916713993490.09459152125235575
380.4431660805403540.09461816517092184
390.2747133912120480.09464457943128332
400.1060217277695310.09466530944697167
41-0.02299823035817310.09468281718440483
42-0.09867643412226680.0946937428701469
43-0.1103174908213920.09470447438051062
44-0.0548538766306620.09471520792593062
450.05518489194382320.09472503189708015
460.2039932854459050.09473297816528793
470.354973986800270.09474051064208233
480.4588208532206820.09474791601815644
Rows: 1-49 | Columns: 3
In [63]:
# Autocorrelation Plot using only the selected lags
amazon.acf(ts = "date", 
           column = "number", 
           by = ["state"],
           p = [1, 3, 6, 7],
           unit = "year",
           method = "pearson")
Out[63]:
value
confidence
01.00.024396841824873748
10.6160.03235856951582219
30.4770.036306700072106235
60.3950.03878403253532033
70.3670.04080280865931269
Rows: 1-5 | Columns: 3
In [64]:
# Autocorrelation Heatmap for each 'month' lag
amazon.acf(ts = "date", 
           column = "number", 
           by = ["state"],
           p = 12,
           unit = "month",
           method = "pearson", 
           round_nb = 3, 
           acf_type = "heatmap")
Out[64]:
"number"
"number"1.0
"lag_12_number"0.778
"lag_1_number"0.752
"lag_11_number"0.577
"lag_2_number"0.422
"lag_10_number"0.334
"lag_3_number"0.201
"lag_9_number"0.154
"lag_6_number"-0.06
"lag_4_number"0.052
"lag_7_number"-0.043
"lag_5_number"-0.033
"lag_8_number"0.027
Rows: 1-13 | Columns: 2
In [65]:
# Autocorrelation Line for each 'month' lag
amazon.acf(ts = "date", 
           column = "number", 
           by = ["state"],
           p = 12,
           unit = "month",
           method = "pearson",
           acf_type = "line")
Out[65]:
value
confidence
01.00.024396841824873748
10.7520.03627598368700659
20.4220.04460428511364169
30.2010.0488503589603483
40.0520.05097893985732261
5-0.0330.05227002803589448
6-0.060.05273251493184901
7-0.0430.05300388580086683
80.0270.053048454728269544
90.1540.053082940352639066
100.3340.05310781786440197
110.5770.05312416251240187
120.7780.053136467322383245
Rows: 1-13 | Columns: 3

See Also

vDataFrame.asfreq Interpolates and computes a regular time interval vDataFrame.
vDataFrame.corr Computes the Correlation Matrix of a vDataFrame.
vDataFrame.cov Computes the Covariance Matrix of the vDataFrame.
vDataFrame.pacf Computes the Partial Autocorrelations of the input vcolumn.