rel-stack Stack-Exchange Q&A Website Database

Database Description: Stack Exchange is a network of question-and-answer websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows the sites to be self-moderating. In our benchmark, we use the stats-exchange site. We derive from the raw data dump from 2023-09-12.

Database Statistics:

Num of Tables 7
Num of Rows 38,109,828
Num of Columns 51
Starting Time 2010-03-27
Validation timestamp 2019-01-01
Testing timestamp 2021-01-01
Time window 3 months

Database schema:

To load this relational database in RelBench, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-stack")

References:

[1] Stack Exchange Data Dump.

Dataset License: CC BY-SA 4.0 DEED.


Node Classification Tasks

user-engagement

Task Description: For each user predict if a user will make any votes, posts, or comments in the next 3 months.

Evaluation metric: AUROC

user-badge

Task Description: For each user predict if a user will receive a new badge in the next 3 months.

Evaluation metric: AUROC

Node Regression Tasks

post-votes

Task Description: For each user post predict how many votes it will receive in the next 3 months

Evaluation metric: MAE

Link Prediction Tasks

user-post-comment

Task Description: Predict a list of existing posts that a user will comment in the next two years.

Evaluation metric: MAP

post-post-related

Task Description: Predict a list of existing posts that users will link a given post to in the next two years.

Evaluation metric: MAP