verticapy.vDataFrame.sessionize#
- vDataFrame.sessionize(ts: str, by: str | list[str] | None = None, session_threshold: str = '30 minutes', name: str = 'session_id') vDataFrame #
Adds a new
vDataColumn
to thevDataFrame
that corresponds to sessions (user activity during a specific time). A session ends whents - lag(ts)
is greater than a specific threshold.Parameters#
- ts: str
vDataColumn
used as timeline. It is used to order the data. It can be a numerical or type date (date
,datetime
,timestamp
…)vDataColumn
.- by: SQLColumns, optional
vDataColumn
used in the partition.- session_threshold: str, optional
This parameter is the threshold that determines the end of the session. For example, if it is set to ‘10 minutes’, the session ends after 10 minutes of inactivity.
- name: str, optional
The session name.
Returns#
- vDataFrame
self
Examples#
Let’s begin by importing VerticaPy.
import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.Let us create a
vDataFrame
with multiple columns:vdf = vp.vDataFrame( { "time": [ "1993-11-03 00:00:00", "1993-11-03 00:14:00", "1993-11-03 00:07:00", "1993-11-03 01:00:00", "1993-11-03 01:05:05", "1993-11-03 01:15:05", "1993-11-03 01:45:01", ], "val": [0., 1., 2., 4., 5., 5.5, 6., 8], }, )
We can make sure the correct data type is assigned:
vdf["time"].astype("datetime")
📅timeDatetime100%123valNumeric(4)100%1 1993-11-03 00:00:00 0.0 2 1993-11-03 00:14:00 1.0 3 1993-11-03 00:07:00 2.0 4 1993-11-03 01:00:00 4.0 5 1993-11-03 01:05:05 5.0 6 1993-11-03 01:15:05 5.5 7 1993-11-03 01:45:01 6.0 Now we can create sessions based on custom time intervals:
vdf.sessionize( ts = "time", session_threshold = "15 minutes", )
📅timeDatetime100%... 123valNumeric(4)100%123session_idInteger100%1 1993-11-03 00:00:00 ... 0.0 0 2 1993-11-03 00:07:00 ... 2.0 0 3 1993-11-03 00:14:00 ... 1.0 0 4 1993-11-03 01:00:00 ... 4.0 1 5 1993-11-03 01:05:05 ... 5.0 1 6 1993-11-03 01:15:05 ... 5.5 1 7 1993-11-03 01:45:01 ... 6.0 2 Note
This method is particularly useful when dealing with clickstream analytics, enabling the creation of sessions as part of data preparation for machine learning. It helps understand user behaviors and interactions as they navigate through applications, for example.
See also