Manifesto on Interactive Information Retrieval Resource Reuse

This manifesto on resource re-use wants to draw the attention of the interactive information retrieval (IIR) community to the challenges of research documentation and archiving for future use. It advocates the uptake of sharing and re-use in the IIR research community in order to support validation, standardization and ultimately comparability or even reproducibility of IIR research. While IIR prides itself on the heterogeneity of research approaches and the inclusiveness of the community, it is a sign of a maturing research area that methodologies and research designs increasingly adhere to certain standards and protocols. Some standardization of empirical research can already be observed in the IIR community, such as the evaluation of an experiment and its appropriate documentation.

While our vision of an ideal future does not include total conformity of research questions or methodologies, this manifesto does stress the importance of well-documented and - in the best of all cases - re-usable and shareable resources in IIR research in order to progress the discipline. Only if we successfully trace, document and reference our research designs, data and - optimally - infrastructure, will we be able to build stably on top of the empirical and experiment-based research foundation that has already been established for IIR and move the state of the art forward without the risk of repeating research efforts needlessly.

Contributing to the Manifesto

This is a living document intended for discussion. If you want to join the discussion to update the manifesto, please submit an new issue or contribute to an existing issue.

Defining Resource and Re-use

Resources

Research is a complex process that is supported by three main resource types that should be documented and re-used. As shown in Figure, we view these resources as interlocking puzzle pieces representing the different elements of the research process: (1) research design, (2) research infrastructure, and (3) research data.

Research design, the first type of resource, is defined here as the methods and techniques used to collect and analyse the empirical data. This includes the design of the research environment (e.g., location of study, study participants, tasks), any data collection protocols or other instruments used (including the questions and scales) as well as the data analysis methods and measures.

We do not consider the physical research environment to be part of the research design, but part of the research infrastructure instead. In many research fields and in IIR in particular, a technical infrastructure is also required and designed to handle the experimental procedure and tools and support the overall execution of the research design. In IIR, the research infrastructure usually provides access to an IR system (including software, interfaces and collections) as well as the application of the data collection techniques such as user management, pre- and post-test questionnaires, assignment and randomization of user tasks, interaction logging, and other components.

Finally, the type of resource that has arguably received the most attention in recent years is research data. An increasing focus on open access and open science has been responsible for promoting research data publication and the further development of research data repositories. Research data is typically defined quite broadly,for example as “the output of any systematic investigation involving a process of observation, experiment or the testing of a hypothesis, which when assembled in context and interpreted expertly will produce new knowledge”

For the purposes of the IIR community and this manifesto, we view research as a holistic process and consider the term ‘resource’ to cover any component belonging to one of these three resource types: research design, research infrastructure and research data.

Re-use

For the purposes of the IIR community and this manifesto, we define re-use in its broadest sense as use of research data, research designs or infrastructure for more than an individual purpose.

Principles to Support Re-Use in IIR

We now present eight principles for improving resource re-use in the IIR community. At the core lie the principles on Knowledge (#1) and Terminology (#2) that underpin everything else. The remaining principles are split into those aimed at the individual researcher - Plan (#3), Document (#4), and Publish (#5) - and those aimed at the IIR community as a whole - Require (#6) and Reward (#7). Around all of these sits the Strive for more! principle (#8) that encourages all of us, both individually and as a community, to move re-use forward.

For each principle, we describe who we argue is primarily responsible for it as well as which of the three resource types the principle addresses.

All principles apply to all resource types, but may be more relevant for some than for others.

Principle 1: Make tacit knowledge explicit

All implicit knowledge, choices, and study aspects should be noted, codified, and made explicit.

Transforming existing protocols and research methods, with all the distinct elements that they consist of, into a study requires a large number of choices, that are often based upon implicit knowledge and experience. These implicit aspects of the study are often hard to codify, but they need to be documented and made explicit to enable any re-use. This extends into the study itself, where checklists and notes should be used to capture exceptions and unusual aspects or events.

Inspired by the Manifesto for Research Services \citep{chesbrough06research}, this principle helps not only with reproducibility, but also reduces the risk of biases and other issues that later call into question the study’s results. Additionally, by describing the tacit knowledge of a study and comparing it with others, it becomes possible to gradually develop and merge that tacit knowledge into the research design, formalisms and procedures, which can then be more easily codified and become common ground.

Concrete actions

Prepare a checklist of the practical matters to take into account before, during, and after the experiment.
Take notes, screenshots (and photos or videos where appropriate) during your experiments.
Take note of actions and decisions that are not part of standardized research designs and not on the checklist.
Take note of aspects that are difficult to reproduce and describe them.

Discuss via issues

Principle 2: Define your terminology

All concepts and terms should be precisely defined, ideally referring to existing, published definitions.

Even though many concepts and terms are common in the community, there are often multiple definitions with subtle or not so subtle differences. The specific definitions used in a study should be made explicit, including references to publications where they exist. Where existing definitions are not used, a clear explanation should be provided for why new definitions are required. The terminology should also be used precisely and consistently throughout the study.

This is particularly important as IIR is a highly interdisciplinary field, with researchers coming from various academic backgrounds, who might not all be familiar with the same concepts and terminology, or who would assume different definitions based on their disciplinary background and training. Reproducibility, for instance, is defined by \citet{Claerbout:1992} as the same researchers using the same experimental setup, whereas the ACM definition \cite{ACM:2020} refers to this as repeatability. Instead, ACM defines reproducibility as a different research group using a different setup to confirm the reported outcomes and findings. The potential misunderstandings that could arise from such subtle differences could be avoided by providing clear and specific definitions.

Concrete actions

Identify which concepts and terminology you will use while creating your research design.
Look for alternative definitions or related concepts.
Discuss differences in definitions and uses of concepts and terms that you find across publications.
Discuss to what extent they apply to your work.
Argue for the definitions you choose or why you introduce your own.
Cite papers that define concepts that you use including those with alternative definitions that you do not use.

Discuss via issues

Principle 3: Plan the data life cycle

The full data life cycle, covering collection, processing, analysis, publication, and archiving, should be planned before starting any data collection.

Data collection is often time- and resource-consuming. Therefore, IIR research should start with an initial Data Management Plan (DMP) that clearly describes what kind of research data will be collected or generated during the study, how it will be stored, managed, described and analysed, and how it will be shared and archived at the end of the study. This ensures that the collection and usage of data is transparent to the participants and research community.

In the fields of medical and health studies, presenting such as DMP in advance is generally a basic requirement and these plans are usually made publicly available. Increasingly, funding organizations require a DMP to be submitted together with a project proposal in most data-producing research areas. There are several tools, which will support the creation of a DMP, which will also provide guidelines on how to document and archive data appropriately.

Concrete actions

Start with a data management plan and document what data will be collected and for which purpose.
Make sure that your data collection practice follows current data protection and privacy regulations such as the General Data Protection Regulation (GDPR) in the European Union.
Publish your data management plan as part of your research project and/or publication of results.

Discuss via issues

Principle 4: Document your research resources

All aspects of the research resources should be documented in detail.

In principle, everything required to reproduce an experiment, including the raw research data, analyses and systems used should be documented. As an initial minimum requirement, the research design must be documented to enable a basic level of resource re-use. This includes the experimental setup (e.g., location of study, study participants, tasks), any data collection protocols or other instruments used (including the questions and scales) as well as the data analysis and measures.

Documenting the research design must include implicit knowledge and terminology definitions as covered by principles #1 and #2. Additionally, contextual information, such as the study goals, research questions, or associated publications should also be documented. Where available, standardized and machine-readable documentation formats should be used, e.g., the User Study Exchange Format.

In a more advanced re-use context, the research data and research infrastructure (including software) should also be documented in similar ways.

Concrete actions

Review existing practices and guidelines (in the social sciences it is common to archive codebooks for research designs, for example).
Define your data model or use a community standard for research resource documentation.

Discuss via issues

Principle 5: Archive & publish your research resources

All documentation (see principle #4) should be archived in open access, long-term storage to be available for publication reviews and should then be published in a referenceable location.

Publications are generally limited in the amount of space available to authors and details on the research designs, data and infrastructure are frequently the first victim of this limit \citep{huurdeman19multi}. There are several platforms where additional research data and documents can be formally published, e.g., the Open Science Framework, Zenodo, or figshare. Many funding organisations already require that publications must be made available via public open access repositories - some of them can also be used to archive the research designs and other research resources.

A dedicated repository for research designs in IIR does not (yet) exist, but for some resources related to the research designs, specific repositories exist, for example RePaST for tasks. Tasks used in studies should therefore also be provided to those repositories to provide more access points for the community. This may involve some duplication of work, but if machine-readable formats are used, this should reduce the effort required.

Concrete actions

Identify an appropriate open access archive repository.
Separately archive those aspects of the resources that can be freely shared and those that cannot for legal or other reasons.
Include links to the archived resources in the publication submission.
After publication of research outcomes, publish those resource aspects that can be freely shared and update the publication to refer to those.
Publish parts of the resources to specialised repositories, where such exist.

Discuss via issues

Publication venues should require that resources be archived and available to reviewers.

The foundations for this principle are often already in place with publishers. What is needed is that journals and conferences in the IIR field adopt them. This principle also goes hand-in-hand with principle #7, which is focused on rewarding good resource sharing behaviour. Together they should produce better, more co-operative outcomes. Optionally, publication venues could also require that the archived resources are published after the main publication.

To ensure uptake, minimally the research design documentation must become a required part of the publication process. For complete transparency, also the research data and infrastructures should be made available first to reviewers and then - if at all possible - to the public. If a data archival section is already a requirement for a publication, the research design documentation can be integrated there, but if not, then there should be a required section in the paper that includes a referenceable link to the documentation.

Concrete actions

Require that at least documented research designs for IIR studies are available to reviewers at CHIIR.
Petition journals relevant to the IIR field to add research design and data archiving into their publication workflow.

Discuss via issues

Researchers who follow good sharing practices should be visibly rewarded.

Complementing the basic requirements in principle #6, the IIR community should also develop a range of rewards to highlight good reusability practice. Changing the incentives for responsible sharing practices is a problem that requires a coordinated effort by all stakeholders to alter existing reward structures—authors, reviewers, chairs/editors, publishers, funders, institutions, and societies.

One particular reward mechanism that shows promise are badges to acknowledge reusability practices. Promoting or even requiring such badges would signal that the IIR community values these practices. An example of the successful use of such a badging system is the Psychological Science journal, where the adoption of open data badges has had a positive effect, increasing data sharing by a factor of more than ten. Another such initiative is by the Association for Computing Machinery (ACM), which proposes a set of badges to denote functional, reusable, and available research artifacts.

Another approach for rewarding good practices would be to provide incentives towards providing more detail when reporting methodology, study procedures, i.e., the research design, and results. This is achieved by either allowing authors to add extra pages to a publication or provide additional publication venues or categories, dedicated to the reporting of research designs. This would remove the current incentive structure towards providing simplified, ‘clean’ narratives, which can result in the incomplete reporting of research designs or results.

Concrete actions

Allow for adding pages to a publication that are focused on reporting research designs.
Institute a badging system with different levels at CHIIR, ISIC, and topically related journals.
Institute an award for best-documented research paper at conferences like CHIIR.
Add a publication category focused on reporting research designs, such as the experience papers at the 2019 BIIRRR workshop.

Discuss via issues

Principle 8: Strive for more!

Unlike the previous seven principles, which cover the basics required to embed re-use practices into the IIR community, this principle covers where the IIR community could go next. As such, the principle does not contain any concrete steps, only potential actions.

Potential actions

If you have shared your research design and data management plan, the next step is to share your research data and infrastructure. If you have shared your data and research infrastructure, the next step is to add documentation. Documentation can have different levels of detail and explicitness and different levels of structure and adherence to conventions. In the next section we offer concrete suggestions for increasing levels of documentation and sharing.
Beyond sharing the designs, infrastructure and data of your research, you could pre-register your IIR studies, including hypotheses that are to be tested. There is a call for more transparency, including pre-registration of studies and hypotheses, across an increasing number of fields. Although a significant part of our research is exploratory, where we do not have a very specific hypothesis to test, there are many studies where we have very concrete and specific questions and hypotheses. Pre-registration of the design, setup and hypotheses of such a study helps others to understand the original motivation behind the design and setup of the study, so that its components can more easily be re-used and compared against.
Another step beyond sharing resources is to try to reproduce an IIR study. This requires a deep understanding of the theory as well as the research design of the original study. With proper re-use of IIR research resources, this type of reproducibility study could be used for graduate work. Organizing dedicated reproducibility tracks at conferences for Master’s or PhD students and their supervisors could provide great training opportunities as well as a deeper understanding of which original studies truly contribute to our understanding of information seeking behavior.

Discuss via issues

Contributing to the Manifesto

Defining Resource and Re-use

Resources

Re-use

Principles to Support Re-Use in IIR

Principle 1: Make tacit knowledge explicit

Concrete actions

Principle 2: Define your terminology

Concrete actions

Principle 3: Plan the data life cycle

Concrete actions

Principle 4: Document your research resources

Concrete actions

Principle 5: Archive & publish your research resources

Concrete actions

Principle 6: Require basic sharing practices

Concrete actions

Principle 7: Reward good sharing practices

Concrete actions

Principle 8: Strive for more!

Potential actions