Sponsored by the AIRR Community

What standards would you like to see for repertoire analysis tools and resources?

As some of you already know, there is an emerging community, led by @Felix_Breden, Jamie Scott, and @tbkepler, which is working hard to coordinate our emerging field of repertoire analysis. There is an upcoming meeting, which unfortunately is already full, but there will be opportunities for remote participation and I’m dedicated to making sure that we get contributions from the broader community.

As part of this effort, there are several working groups that are putting together white papers on standards and recommendations. I’m part of the tools and resources working group along with @caschramm @sdwfrost @davide.bagnara @kiplingd @bussec @dooley, and I’m posting here to ask for your suggestions to direct that work. Other working groups have started a nice pattern in which we first decide on a set of philosophical principles, then talk more about specifics.

Below is a quick sketch of some principles that I came up with @caschramm, and I hope you’ll tear them apart and make better ones. [I know that some of the following will read like pie-in-the-sky, but that’s the point here.] Thanks!

Software tools and resources

Community goals

  1. Promote transparent sharing of tools, methods and information, in order to enable review, contribution and continued development
  2. Provide a platform for understanding current experience, methods and their limitations
  3. Discuss and communicate current challenges, providing a channel between tool/resource developers and users
  4. Find ways of describing, sharing and versioning alternative germline gene sets
  5. Develop standardized benchmark data sets and testing frameworks
  6. Strive for modularity and interoperability in independent software components

Software recommendations

Software should

  1. Use OSI-approved open-source licenses and hosted on publicly available repositories
  2. Be versioned
  3. Include metadata about run parameters should be part of the output
  4. Use community-curated standard file formats
  5. Include example data and checks for expected output
  6. Clearly list dependencies and/or provide scripts to build a virtual machine
  7. Have a public support forum or channel, describing the level of support users can expect
  8. Make it clear whether the software is still under active development

Biological tools and resources

Community goals

  1. Choose a central, searchable, location to list available protocols and reagents
  2. Develop a framework for a standardized MTA to facilitate sharing of reagents
  3. Develop standardized benchmark biological resources to test library preparation methods

Protocol recommendations

Protocols should

  1. Be made publicly available in full detail
  2. Be kept in a versioned repository and updated as necessary
  3. Novel reagents (plasmids, cell lines, antibodies, etc.) should be made available to all qualified researchers, preferably through an established repository

Develop standardized benchmark biological resources for library preparation methodologies


Thanks! I’ve added it.

Great sketch, here my comments to some of the points:

  • Software recommendations 1: Approved by whom? I think a good solution would be to refer to the license lists of the FSF and/or the OSI.

  • Community goals 2: Is the idea to come up with a new MTA or to set up a framework on how to use an existing one? In the first case, we should think about whether there is any material that our community frequently deals with that is not covered by the current standard MTAs. Developing a new MTA is likely a reasonable (legal) effort and people might still not want to share their reagents (i.e. not signing an MTA in the first place).

  • Protocol recommendations 3: Depending on the type of material there can be non-negligible cost (for production, QC and storage) to the lab which provides it (this should of course never argue for not sharing materials). Nevertheless it could be helpful to identify and recommend repositories, which distribute materials at a fair price, in a non-discriminatory fashion and can take care about MTAs (e.g. Addgene).

Thanks Christian.

For Community goal 2, I was thinking more along the lines of the first case. It probably would be a reasonable effort, yes, but the hope is that by providing a template where people can just fill in the source, recipient, and item being shared, we might encourage them to share more than they might otherwise. As you say, this will require thinking about issues specific to our community which might be contributing to a reluctance to share.

Thanks, Christian!

I meant OSI-approved, and have updated the text to reflect that.

Some additional suggestions:

Community Goals

  • Promote transparent sharing of tools, methods and information, in order to enable review, contribution and continued development
    (to me, this is the principle that underpins the later recommendations around openness and free availability)
  • Provide a platform for understanding current experience, methods and their limitations
  • Agree and communicate current challenges, providing a channel between tool/resource developers and users

Software should

  • Be usable without restriction by non-profit organisations (an OSI licence would go further than this, but perhaps not everyone will be happy with OSI)
  • Have a public support forum or channel

Software authors should make it clear

  • Whether the software is still under active development
  • How users can obtain support, and what level of support they can expect

Germline gene sets should

  • Be usable without restriction by non-profit organisations
  • Be versioned
  • Reference sources of information

Thanks for inviting me to the forum!


1 Like

A standard MTA might be a good idea but it will likely be difficult in practice to get it through the process at each originating lab as it will likely not conform to the requirements defined by the university’s/company’s legal departments to conform with the laws of the country in question. At a minimum, the framework has to be very flexible.

One substantial problem is comparisons of studies that uses different germline gene sets, numbering systems etc. Do we want to actively discourage such differences between tools and resources, differences that just reduce our ability to compare results, by recommending the use one single gene sets (at the cost of flexibility)? Will it even be possible to agree on one single standard set, a set that is updated at regular, defined intervals? Who would be able to take on such as task?

We’re generally working from the position that we’re not going to be acting as referee (at least at this stage). So we can recommend the implementation of a uniform naming system, but not the use of a single specific resource. A meta-resource that cross-collates would be great, but I don’t think we have the capacity/funding/etc to launch that as part of the Working Group.

William, thank you very much for this. I especially like your broader scope on the community software goals.

I’ve added most of your thoughts to the master list, with a little bit of compactifying-- please suggest edits as needed.

I look forward to discussing licensing. To me, open source seems like an obvious fit for scientific research.

Erick, personally I agree with your stance on open source, however I’m aware that some groups make their software available free for academic or non profit use only (vdj tools is in this category, so I don’t have to mention IMGT again!)
I can understand this approach from a funding point of view, but it’s not strictly open source. I think vdj tools sets a great example in publishing the code despite imposing this restriction. If they are able to attract licence revenue, which I hope they can, this might be a good model for others.

After writing this, I checked Presto and found it uses the Creative Commons non-commercial licence, which isn’t strictly open source, but could be a good option I think: http://creativecommons.org/licenses/by-nc/3.0/us/

I agree with the already mentioned suggestions about software. Our group is currently developing software to be released under the MIT licence.
One thing I think I would add would be a recommendation that, if possible, new software could be packaged with a package manager such as bioconda to allow for full installation with all the required dependencies. I remember at the last antibody meeting there was discussion of an alternative package manager. I like bioconda but I guess other systems will work equally well. The main thing is that we have mechanisms whereby new software that requires multiple dependencies, can be rapidly installed by those who are less experienced with these types of systems.

Based on last week’s call, this should emphasize the second half. Perhaps “Describe the level of support users can expect and how to obtain it”?

For repositories of reagents there are several good ones that already exist: Addgene for plasmids (as I think Christian already pointed out), ATCC for cell lines, JAX for mice etc. There are licensing and MTA issues, as has also already been discussed. I don’t think we need to reinvent the wheel here.

Notes from a conference call

@davide.bagnara had a nice idea to make a protocol template, along with goals for what a protocol should have.

@bussec started a discussion that led to discussion of http://figshare.com mints DOIs and has versioning.

Perhaps not going to be possible. @Nina_Luning_Prak mentioned Uniform Biological Material Transfer Agreement which was a major effort, but didn’t actually fundamentally change the fact that every university has their own MTA.

@Nina_Luning_Prak mentioned that the CAP surveys are a fact in Pathology. Here’s a special issue devoted to this sort of validation in Pathology.

There are also now lots of such resources for the human microbiome.

:star: Put on agenda for the final meeting.

Antibodies can be shipped as plasmids, and stored at Addgene. @bussec points out that researchers need to take it from there to actually express the antibody, but perhaps there are companies that will do this?

General discussion

@bussec suggests having software, protocols, and biologicals as three categories.