Cranfield-style evaluations standardised Information Retrieval (IR) evaluation practices, enabling the creation of programmes such as TREC, CLEF, and INEX, and long-term comparability of IR systems. However, the methodology does not translate well into the Interactive IR (IIR) domain, where the inclusion of the user into the search process and the repeated interaction between user and system creates more variability than the Cranfield-style evaluations can support. As a result, IIR evaluations of various systems have tended to be non-comparable, not because the systems vary, but because the methodologies used are non-comparable. In this paper we describe a standardised IIR evaluation framework, that ensures that IIR evaluations can share a standardised baseline methodology in much the same way that TREC, CLEF, and INEX imposed a process on IR evaluation. The framework provides a common baseline, derived by integrating existing, validated evaluation measures, that enables inter-study comparison, but is also exible enough to support most kinds of IIR studies. This is achieved through the use of a \pluggable" system, into which any web-based IIR interface can be embedded. The framework has been implemented and the software will be made available to reduce the resource commitment required for IIR studies.
|Publication status||Published - 2013|
|Event||Conference & Labs of the Evaluation Forum (CLEF) - Valencia, Spain|
Duration: 23 Sept 2013 → 26 Sept 2013
|Conference||Conference & Labs of the Evaluation Forum (CLEF)|
|Period||23/09/13 → 26/09/13|