rel-stack
Stack-Exchange Q&A Website Database
Database Description: Stack Exchange is a network of question-and-answer websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows the sites to be self-moderating. In our benchmark, we use the stats-exchange site. We derive from the raw data dump from 2023-09-12.
Database Statistics:
Num of Tables | 7 |
Num of Rows | 38,109,828 |
Num of Columns | 51 |
Starting Time | 2010-03-27 |
Validation timestamp | 2019-01-01 |
Testing timestamp | 2021-01-01 |
Time window | 3 months |
Database schema:
To load this relational database in RelBench, do:
from relbench.datasets import get_dataset
dataset = get_dataset("rel-stack")
References:
Dataset License: CC BY-SA 4.0 DEED.
Node Classification Tasks
user-engagement
Task Description: For each user predict if a user will make any votes, posts, or comments in the next 3 months.
Evaluation metric: AUROC
user-badge
Task Description: For each user predict if a user will receive a new badge in the next 3 months.
Evaluation metric: AUROC
Node Regression Tasks
post-votes
Task Description: For each user post predict how many votes it will receive in the next 3 months
Evaluation metric: MAE
Link Prediction Tasks
user-post-comment
Task Description: Predict a list of existing posts that a user will comment in the next two years.
Evaluation metric: MAP
post-post-related
Task Description: Predict a list of existing posts that users will link a given post to in the next two years.
Evaluation metric: MAP