verticapy.machine_learning.model_selection.statistical_tests.tsa.seasonal_decompose#
- verticapy.machine_learning.model_selection.statistical_tests.tsa.seasonal_decompose(input_relation: str | vDataFrame, columns: str | list[str], ts: str, by: str | list[str] | None = None, period: int | tuple | list = -1, polynomial_order: int | tuple | list = 1, estimate_seasonality: bool = True, rule: str | timedelta | None = None, mult: bool = False, two_sided: bool = False, use_row: bool = True, genSQL: bool = False) vDataFrame #
Performs a seasonal time series decomposition. Seasonal decomposition plots are graphical representations of the decomposition of time series data into its various components: trend, seasonality, and residual (error). Seasonal decomposition is a technique used to break down a time series into these underlying components to better understand its patterns and behavior.
Seasonal decomposition plots are useful for several purposes:
- Trend Analysis:
Understanding the long-term direction or behavior of the time series.
- Seasonal Patterns:
Identifying repeating patterns or cycles within the data.
- Anomaly Detection:
Spotting unusual behavior or outliers in the residuals.
- Modeling:
Informing the choice of appropriate models for forecasting or analysis.
Parameters#
- input_relation: SQLRelation
Input relation.
- columns: SQLColumns
Input
vDataColumn
to decompose.- ts: str
Time series
vDataColumn
used to order the data. It can be of type date or a numerical vDataColumn.- by: SQLColumns, optional
vDataColumn
used in the partition.- period: int | tuple | list, optional
Time series period. It is used to retrieve the seasonality component. If
period <= 0
, the seasonal component is estimated using ACF. In this case,polynomial_order
must be greater than 0.It can be an int or a list | tuple of int, each one representing the
period
of the i-th column.- polynomial_order: int | tuple | list, optional
If greater than 0, the trend is estimated using a polynomial of degree
'polynomial_order'
and the parametertwo_sided
is ignored. If equal to 0, the trend is estimated using Moving Averages.It can be an int or a list | tuple of int, each one representing the
polynomial_order
of the i-th column.- estimate_seasonality: bool, optional
If set to
True
, the seasonality is estimated using cosine and sine functions.- rule: TimeInterval, optional
Interval used to slice the time. For example,
'5 minutes'
creates records separated by'5 minutes'
time interval.- mult: bool, optional
If set to
True
, the decomposition type is ‘multiplicative’. Otherwise, ‘additive’.- two_sided: bool, optional
If set to
True
, a centered moving average is used for the trend isolation. Otherwise, only past values are used.- use_row: bool, optional
If set to
True
, theROW
datatype is used to merge all the different columns time series components together.- genSQL: bool, optional
If set to
True
, the SQL code for creating the final relation is generated but not executed.
Returns#
- vDataFrame
object containing the different time series components.
Examples#
Let us use a dataset that has seasonailty. The Airline passengers dataset is a good example.
import verticapy.datasets as vpd data = vpd.load_airline_passengers()
📅dateDate123passengersInteger1 1949-01-01 112 2 1949-02-01 118 3 1949-03-01 132 4 1949-04-01 129 5 1949-05-01 121 6 1949-06-01 135 7 1949-07-01 148 8 1949-08-01 148 9 1949-09-01 136 10 1949-10-01 119 11 1949-11-01 104 12 1949-12-01 118 13 1950-01-01 115 14 1950-02-01 126 15 1950-03-01 141 16 1950-04-01 135 17 1950-05-01 125 18 1950-06-01 149 19 1950-07-01 170 20 1950-08-01 170 21 1950-09-01 158 22 1950-10-01 133 23 1950-11-01 114 24 1950-12-01 140 25 1951-01-01 145 26 1951-02-01 150 27 1951-03-01 178 28 1951-04-01 163 29 1951-05-01 172 30 1951-06-01 178 31 1951-07-01 199 32 1951-08-01 199 33 1951-09-01 184 34 1951-10-01 162 35 1951-11-01 146 36 1951-12-01 166 37 1952-01-01 171 38 1952-02-01 180 39 1952-03-01 193 40 1952-04-01 181 41 1952-05-01 183 42 1952-06-01 218 43 1952-07-01 230 44 1952-08-01 242 45 1952-09-01 209 46 1952-10-01 191 47 1952-11-01 172 48 1952-12-01 194 49 1953-01-01 196 50 1953-02-01 196 51 1953-03-01 236 52 1953-04-01 235 53 1953-05-01 229 54 1953-06-01 243 55 1953-07-01 264 56 1953-08-01 272 57 1953-09-01 237 58 1953-10-01 211 59 1953-11-01 180 60 1953-12-01 201 61 1954-01-01 204 62 1954-02-01 188 63 1954-03-01 235 64 1954-04-01 227 65 1954-05-01 234 66 1954-06-01 264 67 1954-07-01 302 68 1954-08-01 293 69 1954-09-01 259 70 1954-10-01 229 71 1954-11-01 203 72 1954-12-01 229 73 1955-01-01 242 74 1955-02-01 233 75 1955-03-01 267 76 1955-04-01 269 77 1955-05-01 270 78 1955-06-01 315 79 1955-07-01 364 80 1955-08-01 347 81 1955-09-01 312 82 1955-10-01 274 83 1955-11-01 237 84 1955-12-01 278 85 1956-01-01 284 86 1956-02-01 277 87 1956-03-01 317 88 1956-04-01 313 89 1956-05-01 318 90 1956-06-01 374 91 1956-07-01 413 92 1956-08-01 405 93 1956-09-01 355 94 1956-10-01 306 95 1956-11-01 271 96 1956-12-01 306 97 1957-01-01 315 98 1957-02-01 301 99 1957-03-01 356 100 1957-04-01 348 Rows: 1-100 | Columns: 2Note
VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.
Data Visualization#
Let us first have a look how the data looks like:
data["passengers"].plot(ts = "date")
We can visually observe:
Overall increasing trend
A seasonal component
Some noise
Now we can use the
seasonal_decompose
to separate these three.Decomposition#
We can directly the function on the dataset:
from verticapy.machine_learning.model_selection.statistical_tests import seasonal_decompose decomposition = seasonal_decompose( data, "passengers", "date", polynomial_order = 2, mult = True, use_row = False, )
📅dateDate123passengersInteger123passengers_trendFloat(22)123passengers_seasonalFloat(22)123passengers_epsilonFloat(22)1 1949-01-01 112 114.028040859707 0.850135342325453 1.15536255667615 2 1949-02-01 118 115.690060606611 0.882131323741451 1.15625258033391 3 1949-03-01 132 117.366096750151 0.945710121773853 1.1892501982632 4 1949-04-01 129 119.056149290327 1.02383584883414 1.05829695500561 5 1949-05-01 121 120.760218227138 1.0955747794484 0.914575272399499 6 1949-06-01 135 122.478303560585 1.14170452509268 0.96543020834606 7 1949-07-01 148 124.210405290668 1.14986465767455 1.0362320267397 8 1949-08-01 148 125.956523417387 1.11786867625855 1.05111507184691 9 1949-09-01 136 127.716657940742 1.05428987822615 1.01002314478973 10 1949-10-01 119 129.490808860732 0.976164151165865 0.941423777800471 11 1949-11-01 104 131.278976177358 0.9044252205516 0.875922119767362 12 1949-12-01 118 133.08115989062 0.858295474907316 1.03306728138343 13 1950-01-01 115 134.897360000518 0.850135342325453 1.00278151254438 14 1950-02-01 126 136.727576507052 0.882131323741451 1.04467496042437 15 1950-03-01 141 138.571809410221 0.945710121773853 1.07593537733039 16 1950-04-01 135 140.430058710027 1.02383584883414 0.938951930312987 17 1950-05-01 125 142.302324406468 1.0955747794484 0.80178142914427 18 1950-06-01 149 144.188606499545 1.14170452509268 0.905110496502086 19 1950-07-01 170 146.088904989257 1.14986465767455 1.01201036804486 20 1950-08-01 170 148.003219875606 1.11786867625855 1.02751216102852 21 1950-09-01 158 149.93155115859 1.05428987822615 0.999548831409235 22 1950-10-01 133 151.87389883821 0.976164151165865 0.897109889738009 23 1950-11-01 114 153.830262914466 0.9044252205516 0.819389519154582 24 1950-12-01 140 155.800643387357 0.858295474907316 1.04694041708226 25 1951-01-01 145 157.785040256885 0.850135342325453 1.08097115679071 26 1951-02-01 150 159.783453523048 0.882131323741451 1.06420724593379 27 1951-03-01 178 161.795883185847 0.945710121773853 1.16330741963279 28 1951-04-01 163 163.822329245282 1.02383584883414 0.971816292961965 29 1951-05-01 172 165.862791701353 1.0955747794484 0.946536683551796 30 1951-06-01 178 167.917270554059 1.14170452509268 0.928476494746238 31 1951-07-01 199 169.985765803402 1.14986465767455 1.0181078700925 32 1951-08-01 199 172.06827744938 1.11786867625855 1.03457380574925 33 1951-09-01 184 174.164805491994 1.05428987822615 1.00206844586813 34 1951-10-01 162 176.275349931243 0.976164151165865 0.941457185706776 35 1951-11-01 146 178.399910767129 0.9044252205516 0.904868684520885 36 1951-12-01 166 180.53848799965 0.858295474907316 1.07127621905567 37 1952-01-01 171 182.691081628807 0.850135342325453 1.10100855124648 38 1952-02-01 180 184.8576916546 0.882131323741451 1.10382883718317 39 1952-03-01 193 187.038318077029 0.945710121773853 1.09111036979155 40 1952-04-01 181 189.232960896093 1.02383584883414 0.934224942460824 41 1952-05-01 183 191.441620111793 1.0955747794484 0.872514599777302 42 1952-06-01 218 193.66429572413 1.14170452509268 0.98594618389933 43 1952-07-01 230 195.900987733102 1.14986465767455 1.02104406353842 44 1952-08-01 242 198.151696138709 1.11786867625855 1.09251343123325 45 1952-09-01 209 200.416420940953 1.05428987822615 0.989129026987771 46 1952-10-01 191 202.695162139832 0.976164151165865 0.965310720813758 47 1952-11-01 172 204.987919735347 0.9044252205516 0.927742632802777 48 1952-12-01 194 207.294693727498 0.858295474907316 1.09037709971173 49 1953-01-01 196 209.615484116285 0.850135342325453 1.09987831484757 50 1953-02-01 196 211.950290901707 0.882131323741451 1.04830778405767 51 1953-03-01 236 214.299114083766 0.945710121773853 1.16448417255751 52 1953-04-01 235 216.66195366246 1.02383584883414 1.05938757592082 53 1953-05-01 229 219.03880963779 1.0955747794484 0.954272449518939 54 1953-06-01 243 221.429682009755 1.14170452509268 0.961206535149543 55 1953-07-01 264 223.834570778357 1.14986465767455 1.02572286787026 56 1953-08-01 272 226.253475943594 1.11786867625855 1.07543175747877 57 1953-09-01 237 228.686397505467 1.05428987822615 0.982987456097118 58 1953-10-01 211 231.133335463976 0.976164151165865 0.935183884342593 59 1953-11-01 180 233.594289819121 0.9044252205516 0.851996122660106 60 1953-12-01 201 236.069260570902 0.858295474907316 0.992018553768322 61 1954-01-01 204 238.558247719318 0.850135342325453 1.00588344387249 62 1954-02-01 188 241.06125126437 0.882131323741451 0.884091467154869 63 1954-03-01 235 243.578271206058 0.945710121773853 1.02016702411449 64 1954-04-01 227 246.109307544382 1.02383584883414 0.90088112279118 65 1954-05-01 234 248.654360279341 1.0955747794484 0.858969517422285 66 1954-06-01 264 251.213429410937 1.14170452509268 0.920465154301832 67 1954-07-01 302 253.786514939168 1.14986465767455 1.03488400674893 68 1954-08-01 293 256.373616864035 1.11786867625855 1.02235918479967 69 1954-09-01 259 258.974735185538 1.05428987822615 0.948598272370131 70 1954-10-01 229 261.589869903676 0.976164151165865 0.896791959861755 71 1954-11-01 203 264.219021018451 0.904425220551599 0.849492002088961 72 1954-12-01 229 266.862188529861 0.858295474907316 0.999796509368645 73 1955-01-01 242 269.519372437907 0.850135342325453 1.05617846442999 74 1955-02-01 233 272.190572742588 0.882131323741451 0.970397340910711 75 1955-03-01 267 274.875789443906 0.945710121773853 1.02710947237432 76 1955-04-01 269 277.575022541859 1.02383584883414 0.946545650691677 77 1955-05-01 270 280.288272036449 1.0955747794484 0.879258978768822 78 1955-06-01 315 283.015537927674 1.14170452509268 0.97486965430864 79 1955-07-01 364 285.756820215534 1.14986465767455 1.10779156345745 80 1955-08-01 347 288.512118900031 1.11786867625855 1.0759067383181 81 1955-09-01 312 291.281433981163 1.05428987822615 1.01597203262628 82 1955-10-01 274 294.064765458932 0.976164151165865 0.954519307335023 83 1955-11-01 237 296.862113333336 0.904425220551599 0.882715813208691 84 1955-12-01 278 299.673477604375 0.858295474907316 1.0808356610778 85 1956-01-01 284 302.498858272051 0.850135342325453 1.10434947467745 86 1956-02-01 277 305.338255336362 0.882131323741451 1.0284076664753 87 1956-03-01 317 308.19166879731 0.945710121773853 1.08762787713421 88 1956-04-01 313 311.059098654893 1.02383584883414 0.982813461528788 89 1956-05-01 318 313.940544909111 1.0955747794484 0.924565505078088 90 1956-06-01 374 316.836007559966 1.14170452509268 1.03391145664116 91 1956-07-01 413 319.745486607456 1.14986465767455 1.12330813254738 92 1956-08-01 405 322.668982051583 1.11786867625855 1.12281192700754 93 1956-09-01 355 325.606493892345 1.05428987822615 1.03413028805787 94 1956-10-01 306 328.558022129742 0.976164151165865 0.954083744566332 95 1956-11-01 271 331.523566763776 0.904425220551599 0.903820567972856 96 1956-12-01 306 334.503127794446 0.858295474907316 1.06582136238505 97 1957-01-01 315 337.496705221751 0.850135342325453 1.09787512462987 98 1957-02-01 301 340.504299045692 0.882131323741451 1.00209905441883 99 1957-03-01 356 343.525909266269 0.945710121773853 1.09580294411067 100 1957-04-01 348 346.561535883481 1.02383584883414 0.980773114758087 Rows: 1-100 | Columns: 5We can see that there are now three new columns capturing the three elements of data.
Let’s visualize them.
Seasonality
decomposition["passengers_seasonal"].plot(ts = "date")
Trend
decomposition["passengers_trend"].plot(ts = "date")
Noise
decomposition["passengers_epsilon"].plot(ts = "date")
Note
Thanks to seasonal decomposition, we can effortlessly extract the residual, predict its values, and obtain crucial information necessary for computing the time series. Subsequently, by leveraging all the individual components, we are able to effectively recompose the time series.