rel-amazon
Amazon e-commerce database
Database Description: The Amazon e-commerce relational database is a product and user purchasing behavior across Amazon's e-commerce platform. Notably, it contains rich information about each product and transactions. The product table includes price and category information; the review table includes overall rating, whether the user has actually bought the product, and the text of the review itself. We use the subset of book-related products.
Database Statistics:
Num of Tables | 3 |
Num of Customers | 1,850,193 |
Num of Reviews | 21,935,284 |
Num of Products | 506,012 |
Time range | From 1996-06-25 to 2018-09-28 |
Validation timestamp | 2014-01-01 |
Testing timestamp | 2016-01-01 |
Database schema:
To load this relational database in RelBench, do:
from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
References:
Dataset License: Not specified.
Predictive Tasks
rel-amazon-ltv
Predict the life time value of a user
Task Description: Predict the life time value of a user, defined as the sum of prices of the products that the user will buy and review in the next 2 years.
Time window size: 2 years.
Entity filtering: We filter on active users defined as users that wrote review in the past two years before the timestamp.
Task significance: By accurately forecasting LTV, the e-commerce platform can gain insights into user purchasing patterns and preferences, which is essential when making strategic decisions related to marketing, product recommendations, and inventory management. Understanding a user's future purchasing behavior helps in tailoring personalized shopping experiences and optimizing product assortments, ultimately enhancing customer satisfaction and loyalty.
Machine learning taskRegression
Evaluation metricMAE
To load the dataset and the split, do:
from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("rel-amazon-ltv")
task.train_table, task.val_table, task.test_table # training/validation/testing tables
rel-amazon-churn
Predict if the user churns
Task Description: Predict if the user will not buy any product in the next 2 years.
Time window size: 2 years.
Entity filtering: We filter on active users defined as users that wrote review in the past two years before the timestamp.
Task significance: Predicting churn accurately allows companies to identify potential risks of customer attrition early on. By understanding which customers are at risk of disengagement, businesses can implement targeted interventions to improve customer retention. This may include personalized marketing, tailored offers, or enhanced customer service. Effective churn prediction enables businesses to maintain a stable customer base, ensuring sustained revenue streams and facilitating long-term planning and resource allocation.
Machine learning taskBinaryClassification
Evaluation metricAP
from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("rel-amazon-churn")
task.train_table, task.val_table, task.test_table # training/validation/testing tables