`rel-stack` Stack-Exchange Q&A Website Database

Database Description: Stack Exchange is a network of question-and-answer websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows the sites to be self-moderating. In our benchmark, we use the stats-exchange site. We derive from the raw data dump from 2023-09-12.

Database Statistics:

Num of Tables	7
Num of Rows	38,109,828
Num of Columns	51
Starting Time	2010-03-27
Validation timestamp	2019-01-01
Testing timestamp	2021-01-01
Time window	3 months

Database schema:

To load this relational database in RelBench, do:

from relbench.datasets import get_dataset
dataset = get_dataset("rel-stack")

References:

[1] Stack Exchange Data Dump.

Dataset License: CC BY-SA 4.0 DEED.

Node Classification Tasks

`user-engagement`

Task Description: For each user predict if a user will make any votes, posts, or comments in the next 3 months.

Evaluation metric: AUROC

`user-badge`

Task Description: For each user predict if a user will receive a new badge in the next 3 months.

Evaluation metric: AUROC

Node Regression Tasks

`post-votes`

Task Description: For each user post predict how many votes it will receive in the next 3 months

Evaluation metric: MAE

Link Prediction Tasks

`user-post-comment`

Task Description: Predict a list of existing posts that a user will comment in the next two years.

Evaluation metric: MAP

`post-post-related`

Task Description: Predict a list of existing posts that users will link a given post to in the next two years.

Evaluation metric: MAP

rel-stack Stack-Exchange Q&A Website Database

Node Classification Tasks

user-engagement

user-badge