rel-amazon Amazon e-commerce database

Database Description: The Amazon e-commerce relational database is a product and user purchasing behavior across Amazon's e-commerce platform. Notably, it contains rich information about each product and transactions. The product table includes price and category information; the review table includes overall rating, whether the user has actually bought the product, and the text of the review itself. We use the subset of book-related products.

Database Statistics:

Domain E-Commerce
Num of Tables 3
Num of Rows 24,291,489
Num of Columns 15
Starting time 1996-06-25
Validation timestamp 2015-01-01
Testing timestamp 2016-01-01
Time window size 3 months

Database schema:

To load this relational database in RelBench, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")

References:

[1] Amazon Review Data Dump.

Dataset License: Not specified.


Node Classification Tasks

user-churn

Task Description: For each user, predict 1 if the customer does not review any product in the next 3 months, and 0 otherwise.

Evaluation metricAUROC

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("user-churn")
task.train_table, task.val_table, task.test_table # training/validation/testing tables

item-churn

Task Description: For each product, predict 1 if the product does not receive any reviews in the next 3 months.

Evaluation metricAUROC

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("item-churn")
task.train_table, task.val_table, task.test_table # training/validation/testing tables

Node Regression Tasks

user-ltv

Task Description: For each user, predict the $ value of the total number of products they buy and review in the next 3 months.

Evaluation metricMAE

To load the dataset and the split, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("user-ltv")
task.train_table, task.val_table, task.test_table # training/validation/testing tables

item-ltv

Task Description: For each product, predict the $ value of the total number purchases and reviews it recieves in the next 3 months.

Evaluation metricMAE

To load the dataset and the split, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("user-ltv")
task.train_table, task.val_table, task.test_table # training/validation/testing tables

Link Prediction Tasks

user-item-purchase

Task Description: Predict the list of distinct items each customer will pur- chase in the next 3 months.

Evaluation metricMAP

To load the dataset and the split, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("user-item-purchase")
task.train_table, task.val_table, task.test_table # training/validation/testing tables

user-item-rate

Task Description: Predict the list of distinct items each customer will purchase and give a 5 star review in the next 3 months.

Evaluation metricMAP

To load the dataset and the split, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("user-item-rate")
task.train_table, task.val_table, task.test_table # training/validation/testing tables

user-item-review

Task Description: Predict the list of distinct items each customer will purchase and give a detailed review in the next 3 months.

Evaluation metricMAP

To load the dataset and the split, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-amazon")
task = dataset.get_task("user-item-review")
task.train_table, task.val_table, task.test_table # training/validation/testing tables