verticapy.machine_learning.vertica.tsa.ensemble.TimeSeriesByCategory#
- class verticapy.machine_learning.vertica.tsa.ensemble.TimeSeriesByCategory(name: str = None, overwrite_model: bool = False, base_model: TimeSeriesModelBase | None = None)#
This model is built based on multiple base models. You should look at the source models to see entire examples.
Important
This is still Beta.
Parameters#
- name: str, optional
Name of the model. The model is stored in the database.
- overwrite_model: bool, optional
If set to
True
, training a model with the same name as an existing model overwrites the existing model.- base_model: TimeSeriesModelBase
The user should provide a base model which will be used for each category. It could be -
ARIMA
-ARMA
-AR
- :py:class:`~verticapy.machine_learning.vertica.tsa.MA’
Attributes#
Many attributes are created during the fitting phase.
- distinct: list
This provides a sequential list of the categories used to build the different models.
- ts: str
The column name for time stamp.
- y: str
The column name used for building the model.
- _is_already_stored: bool
This tells us whether a model is stored in the Vertica database.
- _get_model_names: list
This returns the list of names of the models created.
Examples#
The following examples provide a basic understanding of usage.
Initialization#
For this example, we will use a subset of the amazon dataset.
import verticapy.datasets as vpd amazon_full = vpd.load_amazon()
📅dateDateAbcstateVarchar(32)123numberInteger1 1998-01-01 ACRE 0 2 1998-01-01 ALAGOAS 0 3 1998-01-01 AMAPÁ 0 4 1998-01-01 AMAZONAS 0 5 1998-01-01 BAHIA 0 6 1998-01-01 CEARÁ 0 7 1998-01-01 DISTRITO FEDERAL 0 8 1998-01-01 ESPÍRITO SANTO 0 9 1998-01-01 GOIÁS 0 10 1998-01-01 MARANHÃO 0 11 1998-01-01 MATO GROSSO 0 12 1998-01-01 MATO GROSSO DO SUL 0 13 1998-01-01 MINAS GERAIS 0 14 1998-01-01 PARANÁ 0 15 1998-01-01 PARAÍBA 0 16 1998-01-01 PARÁ 0 17 1998-01-01 PERNAMBUCO 0 18 1998-01-01 PIAUÍ 0 19 1998-01-01 RIO DE JANEIRO 0 20 1998-01-01 RIO GRANDE DO NORTE 0 21 1998-01-01 RIO GRANDE DO SUL 0 22 1998-01-01 RONDÔNIA 0 23 1998-01-01 RORAIMA 0 24 1998-01-01 SANTA CATARINA 0 25 1998-01-01 SERGIPE 0 26 1998-01-01 SÃO PAULO 0 27 1998-01-01 TOCANTINS 0 28 1998-02-01 ACRE 0 29 1998-02-01 ALAGOAS 0 30 1998-02-01 AMAPÁ 0 31 1998-02-01 AMAZONAS 0 32 1998-02-01 BAHIA 0 33 1998-02-01 CEARÁ 0 34 1998-02-01 DISTRITO FEDERAL 0 35 1998-02-01 ESPÍRITO SANTO 0 36 1998-02-01 GOIÁS 0 37 1998-02-01 MARANHÃO 0 38 1998-02-01 MATO GROSSO 0 39 1998-02-01 MATO GROSSO DO SUL 0 40 1998-02-01 MINAS GERAIS 0 41 1998-02-01 PARANÁ 0 42 1998-02-01 PARAÍBA 0 43 1998-02-01 PARÁ 0 44 1998-02-01 PERNAMBUCO 0 45 1998-02-01 PIAUÍ 0 46 1998-02-01 RIO DE JANEIRO 0 47 1998-02-01 RIO GRANDE DO NORTE 0 48 1998-02-01 RIO GRANDE DO SUL 0 49 1998-02-01 RONDÔNIA 0 50 1998-02-01 RORAIMA 0 51 1998-02-01 SANTA CATARINA 0 52 1998-02-01 SERGIPE 0 53 1998-02-01 SÃO PAULO 0 54 1998-02-01 TOCANTINS 0 55 1998-03-01 ACRE 0 56 1998-03-01 ALAGOAS 0 57 1998-03-01 AMAPÁ 0 58 1998-03-01 AMAZONAS 0 59 1998-03-01 BAHIA 0 60 1998-03-01 CEARÁ 0 61 1998-03-01 DISTRITO FEDERAL 0 62 1998-03-01 ESPÍRITO SANTO 0 63 1998-03-01 GOIÁS 0 64 1998-03-01 MARANHÃO 0 65 1998-03-01 MATO GROSSO 0 66 1998-03-01 MATO GROSSO DO SUL 0 67 1998-03-01 MINAS GERAIS 0 68 1998-03-01 PARANÁ 0 69 1998-03-01 PARAÍBA 0 70 1998-03-01 PARÁ 0 71 1998-03-01 PERNAMBUCO 0 72 1998-03-01 PIAUÍ 0 73 1998-03-01 RIO DE JANEIRO 0 74 1998-03-01 RIO GRANDE DO NORTE 0 75 1998-03-01 RIO GRANDE DO SUL 0 76 1998-03-01 RONDÔNIA 0 77 1998-03-01 RORAIMA 0 78 1998-03-01 SANTA CATARINA 0 79 1998-03-01 SERGIPE 0 80 1998-03-01 SÃO PAULO 0 81 1998-03-01 TOCANTINS 0 82 1998-04-01 ACRE 0 83 1998-04-01 ALAGOAS 0 84 1998-04-01 AMAPÁ 0 85 1998-04-01 AMAZONAS 0 86 1998-04-01 BAHIA 0 87 1998-04-01 CEARÁ 0 88 1998-04-01 DISTRITO FEDERAL 0 89 1998-04-01 ESPÍRITO SANTO 0 90 1998-04-01 GOIÁS 0 91 1998-04-01 MARANHÃO 0 92 1998-04-01 MATO GROSSO 0 93 1998-04-01 MATO GROSSO DO SUL 0 94 1998-04-01 MINAS GERAIS 0 95 1998-04-01 PARANÁ 0 96 1998-04-01 PARAÍBA 0 97 1998-04-01 PARÁ 0 98 1998-04-01 PERNAMBUCO 0 99 1998-04-01 PIAUÍ 0 100 1998-04-01 RIO DE JANEIRO 0 Rows: 1-100 | Columns: 3We can reduce the number of states for the sake of ease in this example:
amazon = amazon_full[(amazon_full["state"] == "PERNAMBUCO") | (amazon_full["state"] == "SERGIPE")]
Now we can setup a base model that will be created for each unique state inside the dataset. For this example, we use ARIMA.
from verticapy.machine_learning.vertica.tsa import ARIMA base_model = ARIMA(order = (2, 1, 2))
Finally we can now initiate our multiple models in one go:
from verticapy.machine_learning.vertica.tsa.ensemble import TimeSeriesByCategory model = TimeSeriesByCategory(base_model = base_model)
Model Fitting#
We can now fit the model:
model.fit(amazon, ts = "date", y = "number", by = "state")
Important
To train a model, you can directly use the
vDataFrame
or the name of the relation stored in the database. The test set is optional and is only used to compute the test metrics. Inverticapy
, we don’t work usingX
matrices andy
vectors. Instead, we work directly with lists of predictors and the response name.Plots#
We can conveniently plot the predictions on a line plot to observe the efficacy of our model. We need to provide the
idx
which represents the model number.model.plot(idx = 0, npredictions = 5)
Note
You can find out the name of the category by the
distinct
attribute. The sequential list of categories correspond toidx = 0, 1 ...
.model.distinct
.- __init__(name: str = None, overwrite_model: bool = False, base_model: TimeSeriesModelBase | None = None) None #
Must be overridden in the child class
Methods
__init__
([name, overwrite_model, base_model])Must be overridden in the child class
contour
([nbins, chart])Draws the model's contour plot.
deploySQL
([vdf, ts, y, start, npredictions, ...])Returns the SQL code needed to deploy the model.
does_model_exists
(name[, raise_error, ...])Checks whether the model is stored in the Vertica database.
drop
()Drops the model from the Vertica database.
export_models
(name, path[, kind])Exports machine learning models.
features_importance
([idx, show, chart])Computes the input submodel's features importance.
fit
(input_relation, ts, y, by[, ...])Trains the model.
get_attributes
([attr_name])Returns the model attributes.
get_match_index
(x, col_list[, str_check])Returns the matching index.
Returns the parameters of the model.
get_plotting_lib
([class_name, chart, ...])Returns the first available library (Plotly, Matplotlib, or Highcharts) to draw a specific graphic.
get_vertica_attributes
([attr_name])Returns the model Vertica attributes.
import_models
(path[, schema, kind])Imports machine learning models.
plot
([idx, vdf, ts, y, start, npredictions, ...])Draws the input submodel.
predict
([vdf, ts, y, start, npredictions, ...])Predicts using the input relation.
register
(registered_name[, raise_error])Registers the model and adds it to in-DB Model versioning environment with a status of 'under_review'.
regression_report
([metrics, start, ...])Computes a regression report using multiple metrics to evaluate the model (
r2
,mse
,max error
...).report
([metrics, start, npredictions, method])Computes a regression report using multiple metrics to evaluate the model (
r2
,mse
,max error
...).score
([metric, start, npredictions, method])Computes the model score.
set_params
([parameters])Sets the parameters of the model.
Summarizes the model.
to_binary
(path)Exports the model to the Vertica Binary format.
to_pmml
(path)Exports the model to PMML.
to_python
([return_proba, ...])Returns the Python function needed for in-memory scoring without using built-in Vertica functions.
to_sql
([X, return_proba, ...])Returns the SQL code needed to deploy the model without using built-in Vertica functions.
to_tf
(path)Exports the model to the Frozen Graph format (TensorFlow).
Attributes