Loading...

verticapy.vDataFrame.recommend#

vDataFrame.recommend(unique_id: str, item_id: str, method: Literal['count', 'avg', 'median'] = 'count', rating: str | tuple = '', ts: str | None = None, start_date: bool | float | str | timedelta | datetime = '', end_date: bool | float | str | timedelta | datetime = '') vDataFrame#

Recommend items based on the Collaborative Filtering (CF) technique. The implementation is the same as APRIORI algorithm, but is limited to pairs of items.

Parameters#

unique_id: str

Input vDataColumn corresponding to a unique ID. It serves as a primary key in another dataset. In our context, it represents an operation, such as a basket ID, which includes multiple sub-transactions.

item_id: str

Input vDataColumn corresponding to an item ID. It is a secondary key used to compute the different pairs.

method: str, optional

Method used to recommend.

  • count:

    Each item will be recommended based on frequencies of the different pairs of items.

  • avg:

    Each item will be recommended based on the average rating of the different item pairs with a differing second element.

  • median:

    Each item will be recommended based on the median rating of the different item pairs with a differing second element.

rating: str | tuple, optional

Input vDataColumn including the items rating. If the rating type is tuple, it must be composed of 3 elements:

(r_vdf, r_item_id, r_name) where:

r_vdf is an input vDataFrame.

r_item_id is an input vDataColumn which must includes the same id as item_id.

r_name is an input vDataColumn including the items rating.

ts: str, optional

TS (Time Series) vDataColumn used to order the data. The vDataColumn type must be date (date, datetime, timestamp…) or numerical.

start_date: str | PythonNumber | date, optional

Input Start Date. For example, time = '03-11-1993' will filter the data when ts is less than November 1993 the 3rd.

end_date: str | PythonNumber | date, optional

Input End Date. For example, time = '03-11-1993' will filter the data when ts is greater than November 1993 the 3rd.

Returns#

vDataFrame

The vDataFrame of the recommendation.

Examples#

Let’s begin by importing VerticaPy.

import verticapy as vp

Hint

By assigning an alias to verticapy, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions from verticapy are used as intended without interfering with functions from other libraries.

Let us create a vDataFrame which has some purchase transaction data:

  • transaction_id:

    Unique ID for a transaction.

  • item_id:

    The unique ID for different items that were purchased.

  • rating:

    Rating provided by the user for the item purchased.

vdf = vp.vDataFrame(
    {
        "transaction_id": [1, 1, 1, 2, 2, 3, 3, 3],
        "item_id": ["A", "B", "C", "B", "C", "A", "B", "C"],
        "rating": [8, 5, 1, 6, 2, 9, 4, 3],
    },
)

123
transaction_id
Integer
100%
...
Abc
item_id
Varchar(1)
100%
123
rating
Integer
100%
11...A8
21...B5
31...C1
42...B6
52...C2
63...A9
73...B4
83...C3

We can easily create the recommend table from the above data:

recommendations = vdf.recommend(
    unique_id = "transaction_id",
    item_id = "item_id",
    method = "avg",
    rating = "rating",
)

Abc
item1
Varchar(1)
100%
...
Abc
item2
Varchar(1)
100%
123
rank
Integer
100%
1A...B1
2A...C2
3B...A1
4B...C2
5C...A1
6C...B2

Note

This function is highly useful for basket analysis and can be employed to derive valuable recommendations.

Let’s look at another example involving timestamp values:

# Create a vDataFrame with the transaction data
vdf = vp.vDataFrame(
    {
        "transaction_id": [1, 1, 1, 2, 2, 3, 3, 3],
        "item_id": ["A", "B", "C", "B", "C", "A", "B", "C"],
        "rating": [8, 5, 1, 6, 2, 9, 4, 3],
        "date": [
            "2021-1-1",
            "2021-1-1",
            "2021-1-1",
            "2021-1-4",
            "2021-1-4",
            "2021-1-21",
            "2021-1-21",
            "2021-1-21",
        ],
    },
)

123
transaction_id
Integer
100%
...
Abc
item_id
Varchar(1)
100%
Abc
date
Varchar(9)
100%
11...A2021-1-1
21...B2021-1-1
31...C2021-1-1
42...B2021-1-4
52...C2021-1-4
63...A2021-1-21
73...B2021-1-21
83...C2021-1-21

Then we can use the timestamp column to filter the recommendation results:

recommendations = vdf.recommend(
    unique_id = "transaction_id",
    item_id = "item_id",
    method = "avg",
    rating = "rating",
    ts = "date",
    start_date = "2021-1-1",
    end_date = "2021-1-5",
)

Abc
item1
Varchar(1)
100%
...
Abc
item2
Varchar(1)
100%
123
rank
Integer
100%
1A...B1
2A...C2
3B...A1
4B...C2
5C...A1
6C...B2

See also

vDataFrame.add_duplicates() : Add duplicates of values based on weights.