verticapy.vDataFrame.recommend#

Recommend items based on the Collaborative Filtering (CF) technique. The implementation is the same as APRIORI algorithm, but is limited to pairs of items.

Parameters#

unique_id: str

Input vDataColumn corresponding to a unique ID. It serves as a primary key in another dataset. In our context, it represents an operation, such as a basket ID, which includes multiple sub-transactions.

item_id: str

Input vDataColumn corresponding to an item ID. It is a secondary key used to compute the different pairs.

method: str, optional

Method used to recommend.

count:
Each item will be recommended based on frequencies of the different pairs of items.
avg:
Each item will be recommended based on the average rating of the different item pairs with a differing second element.
median:
Each item will be recommended based on the median rating of the different item pairs with a differing second element.

rating: str | tuple, optional

Input vDataColumn including the items rating. If the rating type is tuple, it must be composed of 3 elements:

(r_vdf, r_item_id, r_name) where:

r_vdf is an input vDataFrame.

r_item_id is an input vDataColumn which must includes the same id as item_id.

r_name is an input vDataColumn including the items rating.

ts: str, optional

TS (Time Series) vDataColumn used to order the data. The vDataColumn type must be date (date, datetime, timestamp…) or numerical.

start_date: str | PythonNumber | date, optional

Input Start Date. For example, time = '03-11-1993' will filter the data when ts is less than November 1993 the 3rd.

end_date: str | PythonNumber | date, optional

Input End Date. For example, time = '03-11-1993' will filter the data when ts is greater than November 1993 the 3rd.

Returns#

vDataFrame: The vDataFrame of the recommendation.

Examples#

Let’s begin by importing VerticaPy.

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

Let us create a vDataFrame which has some purchase transaction data:

transaction_id:
Unique ID for a transaction.
item_id:
The unique ID for different items that were purchased.
rating:
Rating provided by the user for the item purchased.

vdf = vp.vDataFrame(
    {
        "transaction_id": [1, 1, 1, 2, 2, 3, 3, 3],
        "item_id": ["A", "B", "C", "B", "C", "A", "B", "C"],
        "rating": [8, 5, 1, 6, 2, 9, 4, 3],
    },
)

	123 transaction_id Integer 100%	...	Abc item_id Varchar(1) 100%	123 rating Integer 100%
1	1	...	A	8
2	1	...	B	5
3	1	...	C	1
4	2	...	B	6
5	2	...	C	2
6	3	...	A	9
7	3	...	B	4
8	3	...	C	3

We can easily create the recommend table from the above data:

recommendations = vdf.recommend(
    unique_id = "transaction_id",
    item_id = "item_id",
    method = "avg",
    rating = "rating",
)

	Abc item1 Varchar(1) 100%	...	Abc item2 Varchar(1) 100%	123 rank Integer 100%
1	A	...	B	1
2	A	...	C	2
3	B	...	A	1
4	B	...	C	2
5	C	...	A	1
6	C	...	B	2

Note

This function is highly useful for basket analysis and can be employed to derive valuable recommendations.

Let’s look at another example involving timestamp values:

# Create a vDataFrame with the transaction data
vdf = vp.vDataFrame(
    {
        "transaction_id": [1, 1, 1, 2, 2, 3, 3, 3],
        "item_id": ["A", "B", "C", "B", "C", "A", "B", "C"],
        "rating": [8, 5, 1, 6, 2, 9, 4, 3],
        "date": [
            "2021-1-1",
            "2021-1-1",
            "2021-1-1",
            "2021-1-4",
            "2021-1-4",
            "2021-1-21",
            "2021-1-21",
            "2021-1-21",
        ],
    },
)

	123 transaction_id Integer 100%	...	Abc item_id Varchar(1) 100%	Abc date Varchar(9) 100%
1	1	...	A	2021-1-1
2	1	...	B	2021-1-1
3	1	...	C	2021-1-1
4	2	...	B	2021-1-4
5	2	...	C	2021-1-4
6	3	...	A	2021-1-21
7	3	...	B	2021-1-21
8	3	...	C	2021-1-21

Then we can use the timestamp column to filter the recommendation results:

recommendations = vdf.recommend(
    unique_id = "transaction_id",
    item_id = "item_id",
    method = "avg",
    rating = "rating",
    ts = "date",
    start_date = "2021-1-1",
    end_date = "2021-1-5",
)

	Abc item1 Varchar(1) 100%	...	Abc item2 Varchar(1) 100%	123 rank Integer 100%
1	A	...	B	1
2	A	...	C	2
3	B	...	A	1
4	B	...	C	2
5	C	...	A	1
6	C	...	B	2