Skip to content

Annotated dataset

DOI

We have curated a dataset of annotated run files resulting from different experiments that share the same retrieval method on a more abstract level. All of these runs are based on cross-collection relevance feedback for which relevance labels and the corresponding documents from one or more source collections are used as training data to train a relevance classifier that ranks documents of a target collection. While some of the runs were available from the TREC run archive, others were reimplemented by us. All of the runs are annotated in accordance with the outlined metadata schema. The dataset is hosted in an external Zenodo archive. Some of the runs are used for the demonstration on Colab.

The run dataset is compiled from the following reproduced experiments:

  • Grossman and Cormack @ TREC Common Core 2017 Paper | Runs

  • Grossman and Cormack @ TREC Common Core 2018 Paper | Runs

  • Yu et al. @ TREC Common Core 2018 Paper | Runs

  • Yu et al. @ ECIR 2019 Paper | Runs

  • Breuer et al. @ SIGIR 2020 Paper | Runs

  • Breuer et al. @ CLEF 2021 Paper | Runs

The figure below illustrates the principle idea behind cross-collection relevance feedback.