Annotated dataset

We have curated a dataset of annotated run files resulting from different experiments that share the same retrieval method on a more abstract level. All of these runs are based on cross-collection relevance feedback for which relevance labels and the corresponding documents from one or more source collections are used as training data to train a relevance classifier that ranks documents of a target collection. While some of the runs were available from the TREC run archive, others were reimplemented by us. All of the runs are annotated in accordance with the outlined metadata schema. The dataset is hosted in an external Zenodo archive. Some of the runs are used for the demonstration on Colab.

The run dataset is compiled from the following reproduced experiments:

Grossman and Cormack @ TREC Common Core 2017 Paper | Runs
Grossman and Cormack @ TREC Common Core 2018 Paper | Runs
Yu et al. @ TREC Common Core 2018 Paper | Runs
Yu et al. @ ECIR 2019 Paper | Runs
Breuer et al. @ SIGIR 2020 Paper | Runs
Breuer et al. @ CLEF 2021 Paper | Runs

The figure below illustrates the principle idea behind cross-collection relevance feedback.