verticapy.vDataFrame.recommend#
- vDataFrame.recommend(unique_id: str, item_id: str, method: Literal['count', 'avg', 'median'] = 'count', rating: str | tuple = '', ts: str | None = None, start_date: bool | float | str | timedelta | datetime = '', end_date: bool | float | str | timedelta | datetime = '') vDataFrame #
Recommend items based on the Collaborative Filtering (CF) technique. The implementation is the same as APRIORI algorithm, but is limited to pairs of items.
Parameters#
- unique_id: str
Input
vDataColumn
corresponding to a unique ID. It serves as a primary key in another dataset. In our context, it represents an operation, such as a basket ID, which includes multiple sub-transactions.- item_id: str
Input
vDataColumn
corresponding to an item ID. It is a secondary key used to compute the different pairs.- method: str, optional
Method used to recommend.
- count:
Each item will be recommended based on frequencies of the different pairs of items.
- avg:
Each item will be recommended based on the average rating of the different item pairs with a differing second element.
- median:
Each item will be recommended based on the median rating of the different item pairs with a differing second element.
- rating: str | tuple, optional
Input
vDataColumn
including the items rating. If therating
type istuple
, it must be composed of 3 elements:(r_vdf, r_item_id, r_name) where:
r_vdf is an input
vDataFrame
.r_item_id is an input
vDataColumn
which must includes the same id asitem_id
.r_name is an input
vDataColumn
including the items rating.- ts: str, optional
TS (Time Series)
vDataColumn
used to order the data. ThevDataColumn
type must be date (date, datetime, timestamp…) or numerical.- start_date: str | PythonNumber | date, optional
Input Start Date. For example,
time = '03-11-1993'
will filter the data whents
is less than November 1993 the 3rd.- end_date: str | PythonNumber | date, optional
Input End Date. For example,
time = '03-11-1993'
will filter the data whents
is greater than November 1993 the 3rd.
Returns#
- vDataFrame
The
vDataFrame
of the recommendation.
Examples#
Let’s begin by importing VerticaPy.
import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.Let us create a
vDataFrame
which has some purchase transaction data:- transaction_id:
Unique ID for a transaction.
- item_id:
The unique ID for different items that were purchased.
- rating:
Rating provided by the user for the item purchased.
vdf = vp.vDataFrame( { "transaction_id": [1, 1, 1, 2, 2, 3, 3, 3], "item_id": ["A", "B", "C", "B", "C", "A", "B", "C"], "rating": [8, 5, 1, 6, 2, 9, 4, 3], }, )
123transaction_idInteger100%... Abcitem_idVarchar(1)100%123ratingInteger100%1 1 ... A 8 2 1 ... B 5 3 1 ... C 1 4 2 ... B 6 5 2 ... C 2 6 3 ... A 9 7 3 ... B 4 8 3 ... C 3 We can easily create the recommend table from the above data:
recommendations = vdf.recommend( unique_id = "transaction_id", item_id = "item_id", method = "avg", rating = "rating", )
Abcitem1Varchar(1)100%... Abcitem2Varchar(1)100%123rankInteger100%1 A ... B 1 2 A ... C 2 3 B ... A 1 4 B ... C 2 5 C ... A 1 6 C ... B 2 Note
This function is highly useful for basket analysis and can be employed to derive valuable recommendations.
Let’s look at another example involving timestamp values:
# Create a vDataFrame with the transaction data vdf = vp.vDataFrame( { "transaction_id": [1, 1, 1, 2, 2, 3, 3, 3], "item_id": ["A", "B", "C", "B", "C", "A", "B", "C"], "rating": [8, 5, 1, 6, 2, 9, 4, 3], "date": [ "2021-1-1", "2021-1-1", "2021-1-1", "2021-1-4", "2021-1-4", "2021-1-21", "2021-1-21", "2021-1-21", ], }, )
123transaction_idInteger100%... Abcitem_idVarchar(1)100%AbcdateVarchar(9)100%1 1 ... A 2021-1-1 2 1 ... B 2021-1-1 3 1 ... C 2021-1-1 4 2 ... B 2021-1-4 5 2 ... C 2021-1-4 6 3 ... A 2021-1-21 7 3 ... B 2021-1-21 8 3 ... C 2021-1-21 Then we can use the timestamp column to filter the recommendation results:
recommendations = vdf.recommend( unique_id = "transaction_id", item_id = "item_id", method = "avg", rating = "rating", ts = "date", start_date = "2021-1-1", end_date = "2021-1-5", )
Abcitem1Varchar(1)100%... Abcitem2Varchar(1)100%123rankInteger100%1 A ... B 1 2 A ... C 2 3 B ... A 1 4 B ... C 2 5 C ... A 1 6 C ... B 2 See also
vDataFrame.
add_duplicates()
: Add duplicates of values based on weights.