Sponsored by the AIRR Community

Statistical classifiers for diagnosing disease from immune repertoires

I want to share our preliminary work on developing statistical classifiers for repertoires. The main idea in our paper is to score snippets (6-mers) of CDR3 sequence by their biochemical features with a “detector” function and to aggregate the scores into a single value that can represent a diagnosis. We believe this is an important step toward utilizing the information contained in each individual sequence instead of relying on summary statistics of repertoires (i.e. diversity scores).

2 Likes

Neat paper!

From your Abstract:

…prior methods to date have been limited to focusing on repertoire-level summary statistics, ignoring the vast amounts of information in the millions of individual immune receptors comprising a repertoire. We have developed a novel method that addresses this limitation by using innovative approaches for accommodating the extraordinary sequence diversity of immune receptors and widely used machine learning approaches.

And from the Conclusions:

Our method is the first to apply statistical learning to immune repertoires to aid disease diagnosis, learning repertoire-level labels from the set of individual immune repertoire sequences.

You might be interested in these papers:

1 Like

Thanks.

Perhaps we should have phrased that sentence (and maybe another one like it) differently. The last two papers came out after we had submitted for peer-review.

I think our approach is distinct in that it really highlights using sequence level features, not features that summarize a cluster or that summarize a repertoire.