List of downstream analysis software

mikhail.shugay · March 18, 2017, 11:22pm

A list of software developed for statistical and comparative analysis of mapped and clustered V(D)J rearrangements.

This is a wiki page. Please edit and add software you know about. However, note that this is not the place for closed-source commercial software vendors to advertise their products.

Command-line applications and R/Python libraries:

Change-O (Gupta et al. 2015) - a suite for B-cell repertoire sequencing data analysis: clonal lineage and diversity analysis (Alakazam module), somatic hypermutation analysis (SHazaM module) and allele identification (TIgGER).
VDJtools (Shugay et al. 2015) - summary statistics, repertoire diversity, clustering repertoires, CDR3 amino acid composition analysis. Mostly designed for T-cell repertoires.
tcR (Nazarov et al. 2015) - diversity measures, shared T cell receptor sequences identification, gene usage statistics computation. R package, deprecated in favour of immunarch.
immunarch (manuscript is in preparation) - an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires. Main features include automated format detection and parsing, support for all popular data formats, publication-ready plots, clonotype annotation and tracking, clonotype overlap analysis, gene usage analysis, diversity measures, kmer analysis. The main focus is to simplify common analysis workflows.
repgenHMM (Elhanati et al., 2016) - inferring repertoire generation probabilities and statistics of V(D)J rearrangement process.

GUI-base applications (immune repertoire browsers, etc):

ARResT/Interrogate (Bystry et al., 2016) web-based + standalone
Vidjill-server (Duez et al., 2016) web-based + standalone
VDJviz (Bagaev et al., 2016) web-based + standalone
VDJserver (Cowell et al. 2015) web-based
ClonoPlot (Fähnrich et al. 2017) standalone

psathyrella · March 19, 2017, 12:05am

Maybe it’s just me, but I wonder if you could add a little more detail about what “post-analysis” and “clonotype tables” are?

thanks

mikhail.shugay · March 19, 2017, 12:10am

Fixed, is it better now? Under post analysis I mean any summary statistics/comparative analysis. “Clonotype tables” are results of VDJ mapping and clone clustering software, e.g. tables containing clone frequency, V/D/J segments and CDR3 sequence (and hypermutations for Ig sequences).

psathyrella · March 19, 2017, 12:16am

Yeah, great. thanks.

psathyrella · March 20, 2017, 5:39pm

Actually, wait, sorry, I’m still confused. I can’t really tell the difference between these three wikis

or at least, unless I’m misunderstanding, there’s as much overlap between the three as there is difference. Perhaps we could combine into one page, maybe as a table with features across the top (similar to this)?

ematsen · March 20, 2017, 5:44pm

I don’t think this is what you are actually talking about, but I should point out that the similarity between the links is artifactual. The thing before the topic number is eye candy. E.g.

List of downstream analysis software

Will take you to

List of downstream analysis software

psathyrella · March 20, 2017, 5:48pm

oh, no, thanks, that was just me being lazy with copy pasting links. fixed now

psathyrella · March 29, 2017, 6:46pm

bump?

Unless I missed a reply

mikhail.shugay · March 29, 2017, 7:11pm

My initial classification was:

One gets raw data and maps it to VDJ
One assembles mapped reads into clonotypes/clones
One analyzes the list of clones (the software packages should not necessary work with raw data)

I think it can be wise to merge

into a single page with a table, but I recall that some RNA-Seq/single cell RNA-Seq software are not very optimized for conventional amplicon libraries. I’ll think of the table design and post it here, please give your suggestions on what columns/parameters we need to include in it.

psathyrella · March 29, 2017, 7:23pm

That sounds great, thanks.

As far as columns I was mostly thinking the headers in the existing wikis would be columns, so like “rna-seq data”, “single-cell rna seq”, “complex rearrangements”. But in addition, maybe:

for 1., whether they do raw data processing (e.g. error correction, paired end assembly) in addition to mapping to VDJ

E.Rosati · April 3, 2017, 6:43am

A column specifying which data format they take in input maybe, so to distinguish softwares for raw data (fastq, fasta) and softwares which take in input clonotypes tables of different formats and so on.

A column which distinguish GUI-based applications and command line applications (as already present in the list above).

A column specifying if they work for TCR or BCR or both.

And maybe a column specifying the main functions/features of each, so that the overlap and peculiarity of different tools gets more clear.