Loading...

verticapy.vDataFrame.interpolate#

vDataFrame.interpolate(ts: str, rule: str | timedelta, method: dict | None = None, by: str | list[str] | None = None) vDataFrame#

Computes a regular time interval vDataFrame by interpolating the missing values using different techniques.

Parameters#

ts: str

TS (Time Series) vDataColumn used to order the data. The vDataColumn type must be date (date, datetime, timestamp…).

rule: TimeInterval

Interval used to create the time slices. The final interpolation is divided by these intervals. For example, specifying ‘5 minutes’ creates records separated by time intervals of ‘5 minutes’.

method: dict, optional

Dictionary of interpolation methods. Must be in the following format: {“column1”: “interpolation1” …, “columnk”: “interpolationk”} Interpolation methods must be one of the following:

  • bfill:

    Interpolates with the final value of the time slice.

  • ffill:

    Interpolates with the first value of the time slice.

  • linear:

    Linear interpolation.

by: SQLColumns, optional

vDataColumns used in the partition.

Returns#

vDataFrame

object result of the interpolation.

Examples#

We import verticapy:

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

For this example, we will use a dummy time-series data:

vdf = vp.vDataFrame(
    {
        "time": [
            "1993-11-03 00:00:00",
            "1993-11-03 00:00:01",
            "1993-11-03 00:00:02",
            "1993-11-03 00:00:04",
            "1993-11-03 00:00:05",
        ],
        "val": [0., 1., 2., 4.,5.],
    }
)

Let us confirm that the correct data type is associated with time:

vdf["time"].astype("datetime")
Out[3]: 
None                 time    val  
1    1993-11-03 00:00:00    0.0  
2    1993-11-03 00:00:01    1.0  
3    1993-11-03 00:00:02    2.0  
4    1993-11-03 00:00:04    4.0  
5    1993-11-03 00:00:05    5.0  
Rows: 5 | Columns: 2
📅
time
Datetime
100%
123
val
Numeric(4)
100%
11993-11-03 00:00:000.0
21993-11-03 00:00:011.0
31993-11-03 00:00:022.0
41993-11-03 00:00:044.0
51993-11-03 00:00:055.0

Note

VerticaPy offers a wide range of sample datasets that are ideal for training and testing purposes. You can explore the full list of available datasets in the Datasets, which provides detailed information on each dataset and how to use them effectively. These datasets are invaluable resources for honing your data analysis and machine learning skills within the VerticaPy environment.

We can see that there is no data for the 3rd second. To fill this, we can use the interpolate function quite conveniently:

vdf.interpolate(
    ts = "time",
    rule = "1 second",
    method = {"val": "linear"},
)
📅
time
Timestamp(29)
100%
123
val
Float(22)
100%
11993-11-03 00:00:000.0
21993-11-03 00:00:011.0
31993-11-03 00:00:022.0
41993-11-03 00:00:033.0
51993-11-03 00:00:044.0
61993-11-03 00:00:055.0

See also

vDataFrame.fillna() : Fill the missing values using the input method.
vDataColumn.fill_outliers() : Fill the outliers using the input method.