verticapy.machine_learning.vertica.tsa.ensemble.TimeSeriesByCategory#

class verticapy.machine_learning.vertica.tsa.ensemble.TimeSeriesByCategory(name: str = None, overwrite_model: bool = False, base_model: TimeSeriesModelBase | None = None)#

This model is built based on multiple base models. You should look at the source models to see entire examples.

Important

This is still Beta.

Parameters#

name: str, optional: Name of the model. The model is stored in the database.
overwrite_model: bool, optional: If set to True, training a model with the same name as an existing model overwrites the existing model.
base_model: TimeSeriesModelBase: The user should provide a base model which will be used for each category. It could be - ARIMA - ARMA - AR - :py:class:`~verticapy.machine_learning.vertica.tsa.MA’

Attributes#

Many attributes are created during the fitting phase.

distinct: list: This provides a sequential list of the categories used to build the different models.
ts: str: The column name for time stamp.
y: str: The column name used for building the model.
_is_already_stored: bool: This tells us whether a model is stored in the Vertica database.
_get_model_names: list: This returns the list of names of the models created.

Examples#

The following examples provide a basic understanding of usage.

Initialization#

For this example, we will use a subset of the amazon dataset.

import verticapy.datasets as vpd

amazon_full = vpd.load_amazon()

	📅 date Date	Abc state Varchar(32)	123 number Integer
1	1998-01-01	ACRE	0
2	1998-01-01	ALAGOAS	0
3	1998-01-01	AMAPÁ	0
4	1998-01-01	AMAZONAS	0
5	1998-01-01	BAHIA	0
6	1998-01-01	CEARÁ	0
7	1998-01-01	DISTRITO FEDERAL	0
8	1998-01-01	ESPÍRITO SANTO	0
9	1998-01-01	GOIÁS	0
10	1998-01-01	MARANHÃO	0
11	1998-01-01	MATO GROSSO	0
12	1998-01-01	MATO GROSSO DO SUL	0
13	1998-01-01	MINAS GERAIS	0
14	1998-01-01	PARANÁ	0
15	1998-01-01	PARAÍBA	0
16	1998-01-01	PARÁ	0
17	1998-01-01	PERNAMBUCO	0
18	1998-01-01	PIAUÍ	0
19	1998-01-01	RIO DE JANEIRO	0
20	1998-01-01	RIO GRANDE DO NORTE	0
21	1998-01-01	RIO GRANDE DO SUL	0
22	1998-01-01	RONDÔNIA	0
23	1998-01-01	RORAIMA	0
24	1998-01-01	SANTA CATARINA	0
25	1998-01-01	SERGIPE	0
26	1998-01-01	SÃO PAULO	0
27	1998-01-01	TOCANTINS	0
28	1998-02-01	ACRE	0
29	1998-02-01	ALAGOAS	0
30	1998-02-01	AMAPÁ	0
31	1998-02-01	AMAZONAS	0
32	1998-02-01	BAHIA	0
33	1998-02-01	CEARÁ	0
34	1998-02-01	DISTRITO FEDERAL	0
35	1998-02-01	ESPÍRITO SANTO	0
36	1998-02-01	GOIÁS	0
37	1998-02-01	MARANHÃO	0
38	1998-02-01	MATO GROSSO	0
39	1998-02-01	MATO GROSSO DO SUL	0
40	1998-02-01	MINAS GERAIS	0
41	1998-02-01	PARANÁ	0
42	1998-02-01	PARAÍBA	0
43	1998-02-01	PARÁ	0
44	1998-02-01	PERNAMBUCO	0
45	1998-02-01	PIAUÍ	0
46	1998-02-01	RIO DE JANEIRO	0
47	1998-02-01	RIO GRANDE DO NORTE	0
48	1998-02-01	RIO GRANDE DO SUL	0
49	1998-02-01	RONDÔNIA	0
50	1998-02-01	RORAIMA	0
51	1998-02-01	SANTA CATARINA	0
52	1998-02-01	SERGIPE	0
53	1998-02-01	SÃO PAULO	0
54	1998-02-01	TOCANTINS	0
55	1998-03-01	ACRE	0
56	1998-03-01	ALAGOAS	0
57	1998-03-01	AMAPÁ	0
58	1998-03-01	AMAZONAS	0
59	1998-03-01	BAHIA	0
60	1998-03-01	CEARÁ	0
61	1998-03-01	DISTRITO FEDERAL	0
62	1998-03-01	ESPÍRITO SANTO	0
63	1998-03-01	GOIÁS	0
64	1998-03-01	MARANHÃO	0
65	1998-03-01	MATO GROSSO	0
66	1998-03-01	MATO GROSSO DO SUL	0
67	1998-03-01	MINAS GERAIS	0
68	1998-03-01	PARANÁ	0
69	1998-03-01	PARAÍBA	0
70	1998-03-01	PARÁ	0
71	1998-03-01	PERNAMBUCO	0
72	1998-03-01	PIAUÍ	0
73	1998-03-01	RIO DE JANEIRO	0
74	1998-03-01	RIO GRANDE DO NORTE	0
75	1998-03-01	RIO GRANDE DO SUL	0
76	1998-03-01	RONDÔNIA	0
77	1998-03-01	RORAIMA	0
78	1998-03-01	SANTA CATARINA	0
79	1998-03-01	SERGIPE	0
80	1998-03-01	SÃO PAULO	0
81	1998-03-01	TOCANTINS	0
82	1998-04-01	ACRE	0
83	1998-04-01	ALAGOAS	0
84	1998-04-01	AMAPÁ	0
85	1998-04-01	AMAZONAS	0
86	1998-04-01	BAHIA	0
87	1998-04-01	CEARÁ	0
88	1998-04-01	DISTRITO FEDERAL	0
89	1998-04-01	ESPÍRITO SANTO	0
90	1998-04-01	GOIÁS	0
91	1998-04-01	MARANHÃO	0
92	1998-04-01	MATO GROSSO	0
93	1998-04-01	MATO GROSSO DO SUL	0
94	1998-04-01	MINAS GERAIS	0
95	1998-04-01	PARANÁ	0
96	1998-04-01	PARAÍBA	0
97	1998-04-01	PARÁ	0
98	1998-04-01	PERNAMBUCO	0
99	1998-04-01	PIAUÍ	0
100	1998-04-01	RIO DE JANEIRO	0

Rows: 1-100 | Columns: 3

We can reduce the number of states for the sake of ease in this example:

amazon = amazon_full[(amazon_full["state"] == "PERNAMBUCO") | (amazon_full["state"] == "SERGIPE")]

Now we can setup a base model that will be created for each unique state inside the dataset. For this example, we use ARIMA.

from verticapy.machine_learning.vertica.tsa import ARIMA

base_model = ARIMA(order = (2, 1, 2))

Finally we can now initiate our multiple models in one go:

from verticapy.machine_learning.vertica.tsa.ensemble import TimeSeriesByCategory

model = TimeSeriesByCategory(base_model = base_model)

Model Fitting#

We can now fit the model:

model.fit(amazon, ts = "date", y = "number", by = "state")

Important

To train a model, you can directly use the vDataFrame or the name of the relation stored in the database. The test set is optional and is only used to compute the test metrics. In verticapy, we don’t work using X matrices and y vectors. Instead, we work directly with lists of predictors and the response name.

Plots#

We can conveniently plot the predictions on a line plot to observe the efficacy of our model. We need to provide the idx which represents the model number.

model.plot(idx = 0, npredictions = 5)

Note

You can find out the name of the category by the distinct attribute. The sequential list of categories correspond to idx = 0, 1 .... model.distinct.

__init__(name: str = None, overwrite_model: bool = False, base_model: TimeSeriesModelBase | None = None) → None#: Must be overridden in the child class

Methods

`__init__`([name, overwrite_model, base_model])	Must be overridden in the child class
`contour`([nbins, chart])	Draws the model's contour plot.
`deploySQL`([vdf, ts, y, start, npredictions, ...])	Returns the SQL code needed to deploy the model.
`does_model_exists`(name[, raise_error, ...])	Checks whether the model is stored in the Vertica database.
`drop`()	Drops the model from the Vertica database.
`export_models`(name, path[, kind])	Exports machine learning models.
`features_importance`([idx, show, chart])	Computes the input submodel's features importance.
`fit`(input_relation, ts, y, by[, ...])	Trains the model.
`get_attributes`([attr_name])	Returns the model attributes.
`get_match_index`(x, col_list[, str_check])	Returns the matching index.
`get_params`()	Returns the parameters of the model.
`get_plotting_lib`([class_name, chart, ...])	Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.
`get_vertica_attributes`([attr_name])	Returns the model Vertica attributes.
`import_models`(path[, schema, kind])	Imports machine learning models.
`plot`([idx, vdf, ts, y, start, npredictions, ...])	Draws the input submodel.
`predict`([vdf, ts, y, start, npredictions, ...])	Predicts using the input relation.
`register`(registered_name[, raise_error])	Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.
`regression_report`([metrics, start, ...])	Computes a regression report using multiple metrics to evaluate the model (`r2`, `mse`, `max error`...).
`report`([metrics, start, npredictions, method])	Computes a regression report using multiple metrics to evaluate the model (`r2`, `mse`, `max error`...).
`score`([metric, start, npredictions, method])	Computes the model score.
`set_params`([parameters])	Sets the parameters of the model.
`summarize`()	Summarizes the model.
`to_binary`(path)	Exports the model to the Vertica Binary format.
`to_pmml`(path)	Exports the model to PMML.
`to_python`([return_proba, ...])	Returns the Python function needed for in-memory scoring without using built-in Vertica functions.
`to_sql`([X, return_proba, ...])	Returns the SQL code needed to deploy the model without using built-in Vertica functions.
`to_tf`(path)	Exports the model to the Frozen Graph format (TensorFlow).

Attributes

object_type