Method
The Method describes the mapping of query-document pairs to a ranking score.The metadata information about the Method should include if the run is automatic or manual, i.e., derived with or without a human-in-the-loop approach, and information regarding the indexing and retrieval. While the indexing is usually composed of several processing steps including tokenization, stemming, and stopword removal, the retrieval method assigns scores between a query and a document. Modern retrieval pipelines are often realized in the form of a multi-stage ranking that is supported by our metadata schema as well. If an additional ranking method reranks the output of a previous retrieval method, it can be reported by the reranks
entry that refers to the name of the previous method. Likewise, it is possible to report interpolated scores.
Checklist
method
→automatic
Description: Boolean value indicating if it is a automatic (true
) or manual (false
) run.
Type: Scalar
Encoding: Boolean;!!bool
.method
→score ties
Description: Name or description of the method used to break score ties in the ranking.
Type: Scalar
Encoding: UTF-8 encoded string of characters (RFC3629);!!str
.
Naming convention:(reverse) alphabetical order
,external collection
method
→indexing
→tokenizer
Description: Name of the tokenizer. If available, it can be reported by the class in the software package (see example below).
Type: Scalar
Encoding: UTF-8 encoded string of characters (RFC3629);!!str
.method
→indexing
→stemmer
Description: Name of the stemmer. If possible, the stemmer should be reported by the class name in the software package (see example below). If this is not possible, it should meet the naming conventions below.
Type: Scalar
Encoding: UTF-8 encoded string of characters (RFC3629);!!str
.
Naming convention:Porter
,Krovetz
,Lovins
,Snowball
,n-grams
, ...method
→indexing
→stopwords
Description: Name of the stopword list. If possible, the stopword list should be reported by the resource name in the software package or by an URI (see example below). If this is not possible, it should meet the naming conventions below, e.g., by naming the correspondingretrieval toolkit
.
Type: Scalar
Encoding: UTF-8 encoded string of characters (RFC3629);!!str
.
Naming convention:Indri
,Lucene
,Smart
,Terrier
, ...method
→retrieval
Description: The retrieval approach is documented by a sequence of mappings, where each mapping represents one component of a ranking pipeline, i.e., it is also possible to report multi-stage ranking pipelines by referring to previous ranking stages.
Type: Sequence of mappings;!!seq [!!map, !!map, ...]
.method
→retrieval
→name
Description: Name of the ranking stage component.
Type: Scalar
Encoding: UTF-8 encoded string of characters (RFC3629);!!str
.
Naming convention:bm25
,rm3
,ax
(axiomatic reranking),piv
(pivoted normalization method),dir
(Dirichlet prior method),monobert
method
→retrieval
→method
Description: Class name of the retrieval method.
Type: Scalar
Encoding: UTF-8 encoded string of characters (RFC3629);!!str
.method
→retrieval
→params
Description: Parameter(s) of the retrieval method. Depending on the parameter, a single mapping is defined by the parameter name and a decimal integer or floating number.
Type: Scalar
Encoding: A decimal integer or floating point number;!!int
or!!float
.method
→retrieval
→reranks
Description: Name of the component whose output will be reranked.
Type: Scalar
Encoding: UTF-8 encoded string of characters (RFC3629);!!str
.method
→retrieval
→interpolates
Description: Name of the component whose output will be reranked.
Type: Sequence of scalars;!!seq
.
Encoding: UTF-8 encoded string of characters (RFC3629);!!str
.method
→retrieval
→weight
Description: Interpolation weight.
Type: Scalar
Encoding: A decimal integer or floating point number;!!int
or!!float
.
Example
method:
automatic: true
indexing:
tokenizer: org.apache.lucene.analysis.en.StandardTokenizer
stemmer: org.apache.lucene.analysis.en.PorterStemFilter
stopwords: org.apache.lucene.analysis.standard.StandardAnalyzer.STOP_WORDS_SET
retrieval:
- name: bm25
method: org.apache.lucene.search.similarities.Similarity.BM25Similarity
b: 0.4
k1: 0.9
- name: axiomatic reranker
method: io.anserini.rerank.lib.AxiomReranker
rerankCutoff: 20
axiom.deterministic: true
reranks: bm25
- name: lr reranker
method: sklearn.linear_model.LogisticRegression
reranks: axiomatic reranker
- name: svm reranker
method: sklearn.svm.SVC
reranks: axiomatic reranker
- name: lgb reranker
method: lightgbm
reranks: axiomatic reranker
- name: ensemble
ensembles:
- lr reranker
- svm reranker
- lgb reranker
- name: interpolation
weight: 0.6
interpolates:
- axiomatic reranker
- ensemble