Loading...

verticapy.vDataFrame.sessionize#

vDataFrame.sessionize(ts: str, by: str | list[str] | None = None, session_threshold: str = '30 minutes', name: str = 'session_id') vDataFrame#

Adds a new vDataColumn to the vDataFrame that corresponds to sessions (user activity during a specific time). A session ends when ts - lag(ts) is greater than a specific threshold.

Parameters#

ts: str

vDataColumn used as timeline. It is used to order the data. It can be a numerical or type date (date, datetime, timestamp…) vDataColumn.

by: SQLColumns, optional

vDataColumn used in the partition.

session_threshold: str, optional

This parameter is the threshold that determines the end of the session. For example, if it is set to ‘10 minutes’, the session ends after 10 minutes of inactivity.

name: str, optional

The session name.

Returns#

vDataFrame

self

Examples#

Let’s begin by importing VerticaPy.

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

Let us create a vDataFrame with multiple columns:

vdf = vp.vDataFrame(
    {
        "time": [
            "1993-11-03 00:00:00",
            "1993-11-03 00:14:00",
            "1993-11-03 00:07:00",
            "1993-11-03 01:00:00",
            "1993-11-03 01:05:05",
            "1993-11-03 01:15:05",
            "1993-11-03 01:45:01",
        ],
        "val": [0., 1., 2., 4., 5., 5.5, 6., 8],
    },
)

We can make sure the correct data type is assigned:

vdf["time"].astype("datetime")
📅
time
Datetime
100%
123
val
Numeric(4)
100%
11993-11-03 00:00:000.0
21993-11-03 00:14:001.0
31993-11-03 00:07:002.0
41993-11-03 01:00:004.0
51993-11-03 01:05:055.0
61993-11-03 01:15:055.5
71993-11-03 01:45:016.0

Now we can create sessions based on custom time intervals:

vdf.sessionize(
    ts = "time",
    session_threshold = "15 minutes",
)
📅
time
Datetime
100%
...
123
val
Numeric(4)
100%
123
session_id
Integer
100%
11993-11-03 00:00:00...0.00
21993-11-03 00:07:00...2.00
31993-11-03 00:14:00...1.00
41993-11-03 01:00:00...4.01
51993-11-03 01:05:05...5.01
61993-11-03 01:15:05...5.51
71993-11-03 01:45:01...6.02

Note

This method is particularly useful when dealing with clickstream analytics, enabling the creation of sessions as part of data preparation for machine learning. It helps understand user behaviors and interactions as they navigate through applications, for example.

See also

vDataFrame.analytic() : Use an advanced analytical function on one or two specific vDataColumn.