Below is the draft standard recommendation from the Software WG. It’s a group effort, but the heavy lifting was done by @w.lees.
We are soliciting comments on this draft. Please reply to comment! No comment too big or too small.
AIRR Software WG - Guidance for AIRR Software Tools
Version 1.0 (when finalised)
The adaptive immune-receptor repertoire (AIRR) Community will benefit greatly from cooperation among groups developing tools and resources for AIRR research. The goal of the AIRR Software Working Group is to promote standards for AIRR software tools and resources in order to enable rigorous and reproducible immune repertoire research at the largest scale possible. As one contribution to this goal, we have established the following standards for software tools. Authors whose tools comply with this standard will, subject to ratification from the AIRR Software WG, be permitted to advertise their tools as being AIRR compliant.
- Be published in source code form, and hosted on a publicly available repository with a clear versioning system.
- Be published under a license that permits free access, use, modification, and sharing (under terms no more restrictive than the original licence) for non-commercial purposes. Examples of suitable licences, from that point of view, are the Creative Commons Attribution and Attribution-ShareAlike licences, in both commercial and non-commercial forms.
- Support community-curated standard file formats and strive for modularity and interoperability with other tools. In particular, tools must read and write AIRR Data Standards corresponding to their tool.
- Include example data (in AIRR standard formats where applicable), and checks for expected output from that data, in order to provide a minimal example of functionality allowing users to check that the software is performing as described.
- Provide information about run parameters as part of output.
- Provide a Dockerfile that enables a Docker image to be built such that the tool can be used within a container running that image.
- Ensure that the Dockerfile is kept up-to-date by providing a container on Docker Hub that is automatically maintained in sync with the current release version of the tool.
- Provide at least some level of user support, making it clear what level of support users can expect, and how to obtain it.
Open Source Licences, Versioned Repositories
Tools in this field are evolving rapidly. In the interests of reproducibility and transparency, published work should be based on tools (and versions of tools) that can be obtained easily by other researchers in the future. To that end, AIRR compliant tools must be published in open repositories such as GitHub or Bitbucket, and we encourage publishing users to provide specifics on the version and configuration of tools that they employ.
Community-Curated File Formats
The AIRR Data Representation Working Group has defined standards for the submission of immune receptor repertoire sequencing datasets. Tool authors are requested to support these standards as much as possible, for applicable data sets. The currently implemented standard covers submission to NCBI repositories (BioProject, BioSample, SRA and Genbank). Tool authors can assist by easing/guiding the process of submission as much as possible.
Example Data and Checks
Because the installation and operation of the tools in this field can often be complex, we require example data and details of expected output, so that users can confirm that their installation is functioning correctly. Likewise dependencies (such as germline libraries and other metadata) should be checked when the tool runs, and informative error messages issued if necessary.
Dependencies and Containers
Containers encapsulate everything needed to run a piece of software into a single convenient executable that is largely independent of the user’s environment. Providers of AIRR-related tools must provide a Docker implementation (based on a published Dockerfile) as one download option that users can choose:
- Containers allow users to evaluate a tool easily, without the need to resolve dependencies, configure the environment, etc.
- They also provide a way for users to examine a working implementation, reproduce results, and understand the fine details of installation
To ensure that they are up to date, containers must be built automatically when the current release version of the tool is updated. We recommend the use of Docker Hub for this purpose. Docker files document dependencies clearly, and make it easy for the maintainer to keep the container’s dependencies up to date in subsequent releases.
- At the moment, we do not endorse a specific workflow standard:
- Technology is evolving too rapidly for us to commit to a particular workflow
- Typically, AIRR analysis tools have many options and modes, which would make it difficult to support a ‘plug and play’ environment without unduly restricting functionality
- As tools and workflows evolve, we will keep the position under review and may make stronger technology recommendations in the future.
- We strongly encourage authors of tools to provide concrete, documented, examples of workflows that employ their tools, together with sample input and output data
- Likewise we encourage authors of research publications to provide documented workflows that will enable interested readers to reproduce the results: see, for example, https://github.com/cdebourcy/PNAS_immune_aging > which embodies the workflow for de Bourcy et al., PNAS, 2017.
Standard Data Sets
The WG is working separately on the development and evaluation of simulated data sets. Lists of published real-world datasets are maintained in the AIRR Forum Wiki.
Tool authors must provide some level of support for the tool. They must state explicitly what level of support is provided, and explain how support should be obtained. We recommend a method such as the issues tracker on Github, that publishes support requests transparently, and links resolutions to specific versions or releases. Users are advised to check that the level of support and the frequency of software updates matches their expectations before committing to a tool.
Authors may submit tools to the AIRR Software WG requesting ratification against the standard. The submission must include reviewable and itemised evidence of compliance with each Requirement listed above.
The Software WG will, where appropriate, issue a Certificate of Compliance, stating the version of the tool reviewed and the version of the Standard with which compliance was ratified. After receiving a Certificate, authors will be entitled to claim compliance with the Standard, and may incorporate any artwork provided by AIRR for that purpose.
The Software WG will maintain and publish a list of compliant software.
If a tool does not achieve ratification, the Software WG will provide an explanation, and encourages resubmission once issues have been resolved.
Authors must re-submit tools for ratification following major upgrades or substantial modifications. The Software WG may, at its discretion, request resubmission at any time. If a certified tool subsequently fails ratification, or is not re-submitted in response to a request from the Software WG, compliance may no longer be claimed and the associated artwork may no longer be used.
The Software WG may, at its discretion, issue a new version of this standard at any time. Tools certified against previous version(s) of the standard may continue to claim compliance with those versions and to use the associated artwork. Authors wishing to claim compliance with the new version must submit a new request for certification and may not claim compliance with the new version, or use associated artwork, until and unless certification is obtained.