ir_metadata
An Extensible Metadata Schema
for Information Retrieval Experiments
Experimentation in information retrieval (IR) research is an inherently data-driven process that often results in experimental artifacts - so-called run files. In order to promote the reproducibility of IR experiments, Voorhees et al. introduced the idea of Open Runs proposing to provide every run file with an open-source software repository. We build upon the idea of Open runs and propose to make the experimental artifacts even more valuable and reproducible by metadata annotations of run files. We align the metadata schema to the PRIMAD model, providing a conceptual taxonomy for reproducible IR experiments.
From a practical point of view, we propose to add the metadata, similar to a file header, as comments in the beginning of the run file. The commonly used evaluation toolkit trec_eval
allows to add comments in the run files by starting line comments with #
and an official support is in development for v10.0
.
This website hosts an introduction of the outlined metadata schema of Open Runs for which more details and background information can be found in our resource paper. For each PRIMAD component this website provides checklists that can be used as a reference when annotating run files in order to prepare them for reproducibility.
Besides this website, we introduce the metadata and the software support of repro_eval
in a Colab notebook that uses some annotated runs that are taken from our curated dataset with annotated runs hosted in a Zenodo archive.