rel-amazon Amazon e-commerce database

Database Description: The Amazon e-commerce relational database is a product and user purchasing behavior across Amazon's e-commerce platform. Notably, it contains rich information about each product and transactions. The product table includes price and category information; the review table includes overall rating, whether the user has actually bought the product, and the text of the review itself. We use the subset of book-related products.

Database Statistics:

Num of Tables 3
Num of Customers 1,850,193
Num of Reviews 21,935,284
Num of Products 506,012
Time range From 1996-06-25 to 2018-09-28
Validation timestamp 2014-01-01
Testing timestamp 2016-01-01

Database schema:

To load this relational database in RelBench, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")

References:

[1] Amazon Review Data Dump.

Dataset License: Not specified.


Predictive Tasks

rel-amazon-ltv Predict the life time value of a user

Task Description: Predict the life time value of a user, defined as the sum of prices of the products that the user will buy and review in the next 2 years.

Time window size: 2 years.

Entity filtering: We filter on active users defined as users that wrote review in the past two years before the timestamp.

Task significance: By accurately forecasting LTV, the e-commerce platform can gain insights into user purchasing patterns and preferences, which is essential when making strategic decisions related to marketing, product recommendations, and inventory management. Understanding a user's future purchasing behavior helps in tailoring personalized shopping experiences and optimizing product assortments, ultimately enhancing customer satisfaction and loyalty.

Machine learning taskRegression

Evaluation metricMAE

To load the dataset and the split, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("rel-amazon-ltv")
task.train_table, task.val_table, task.test_table # training/validation/testing tables

rel-amazon-churn Predict if the user churns

Task Description: Predict if the user will not buy any product in the next 2 years.

Time window size: 2 years.

Entity filtering: We filter on active users defined as users that wrote review in the past two years before the timestamp.

Task significance: Predicting churn accurately allows companies to identify potential risks of customer attrition early on. By understanding which customers are at risk of disengagement, businesses can implement targeted interventions to improve customer retention. This may include personalized marketing, tailored offers, or enhanced customer service. Effective churn prediction enables businesses to maintain a stable customer base, ensuring sustained revenue streams and facilitating long-term planning and resource allocation.

Machine learning taskBinaryClassification

Evaluation metricAP

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("rel-amazon-churn")
task.train_table, task.val_table, task.test_table # training/validation/testing tables