Evaluation datasets etc


I’m on the lookout for datasets that fall into one or more of the following categories:

  • Evaluation datasets e.g. datasets from genotyped individuals, clonal datasets
  • Datasets that have comparison groups e.g. immunised vs. naive, old vs. young etc.
  • Challenging datasets e.g. repertoires from poorly characterised species

plus any other categories where advances in methodology might help. This would serve as a useful resource to benchmark and compare methods. One repo that has some evaluation human IGH data is here.

Looking for BCR datasets containing nonproductive recombinations

I don’t have it broken down as such, but here’s a list of publicly available HTS repertoire data, in case it helps:

  1. Bashford-Rogers, R. J. M. et al. Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations. Genome Res. 23, 1874–84 (2013).
    ENA Accession: ERP002120

  2. Boyd, S. D. et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci. Transl. Med. 1, 12ra23 (2009).
    SRA Accession: SRP001460

  3. Collins, A. M., Wang, Y., Roskin, K. M., Marquis, C. P. & Jackson, K. J. L. The mouse antibody heavy chain repertoire is germline-focused and highly variable between inbred strains. Philos. Trans. R. Soc. B Biol. Sci. 370, 20140236 (2015).
    ENA Accession: PRJEB8745

  4. Freeman,J.D. et al. (2009) Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. Genome Res., 19, 1817–24.
    SRA Accession: SRA008633

  5. Hoehn, K. B. et al. Dynamics of immunoglobulin sequence diversity in HIV-1 infected individuals. Philos. Trans. R. Soc. B Biol. Sci. 370, 20140241 (2015).
    ENA Accession: ERP000572

  6. Greiff, V. et al. Quantitative assessment of the robustness of next-generation sequencing of antibody variable gene repertoires from immunized mice. BMC Immunol. 15, 40 (2014).
    ENA Accession: ERP003950

  7. Jackson,K.J.L. et al. (2014) Human Responses to Influenza Vaccination Show Seroconversion Signatures and Convergent Antibody Rearrangements. Cell Host Microbe, 105–114.
    dbGaP Accession: phs000760.v1.p1

  8. Jiang, N. et al. Determinism and stochasticity during maturation of the zebrafish antibody repertoire. Proc. Natl. Acad. Sci. U. S. A. 108, 5348–53 (2011).
    SRA Accession: SRA029829

  9. Jiang,N. et al. (2013) Lineage structure of the human antibody repertoire in response to influenza vaccination. Sci. Transl. Med., 5, 171ra19.
    SRA Accession: SRA058972

  10. Michaeli, M. et al. Immunoglobulin gene repertoire diversification and selection in the stomach - from gastritis to gastric lymphomas. Front. Immunol. 5, 1–14 (2014).
    BioProject Accession: PRJNA206548

  11. Mroczek, E. S. et al. Differences in the Composition of the Human Antibody Repertoire by B Cell Subsets in the Blood. Front. Immunol. 5, 1–14 (2014).
    SRA Accession: SRP037774

  12. Ota, M. et al. Regulation of the B Cell Receptor Repertoire and Self-Reactivity by BAFF. J. Immunol. 185, 4128–4136 (2010).
    BioProject Accession: PRJNA79689

  13. Palanichamy, A. et al. Immunoglobulin class-switched B cells form an active immune axis between CNS and periphery in multiple sclerosis. Sci. Transl. Med. 6, 248ra106–248ra106 (2014).
    BioProject Accession: PRJNA248411

  14. Parameswaran,P. et al. (2013) Convergent Antibody Signatures in Human Dengue. Cell Host Microbe, 13, 691–700.
    BioProject Accession: PRJNA205206

  15. Qi,Q. et al. (2014) Diversity and clonal selection in the human T-cell repertoire. Proc. Natl. Acad. Sci. U. S. A.
    dbGap Accession: phs000787.v1.p1

  16. Stern, J. N. H. et al. B cells populating the multiple sclerosis brain mature in the draining cervical lymph nodes. Sci. Transl. Med. 6, 248ra107 (2014).
    BioProject Accession: PRJNA248475

  17. Tipton, C. M. et al. Diversity, cellular origin and autoreactivity of antibody-secreting cell population expansions in acute systemic lupus erythematosus. Nat. Immunol. (2015). doi:10.1038/ni.3175
    SRA Accession: SRP057017

  18. Vollmers,C. et al. (2013) Genetic measurement of memory B-cell recall using antibody repertoire sequencing. Proc. Natl. Acad. Sci. U. S. A., 110, 13463–8.
    dbGAP Accession: phs000656.v1.p1

  19. Vollmers, C., Penland, L., Kanbar, J. N. & Quake, S. R. Novel Exons and Splice Variants in the Human Antibody Heavy Chain Identified by Single Cell and Single Molecule Sequencing. PLoS One 10, e0117050 (2015).
    SRA Accession: SRP043513

  20. Wang, C. et al. High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets. Proc. Natl. Acad. Sci. U. S. A. 107, 1518–23 (2010).
    SRA Accession: SRA010149

  21. Wang,C. et al. (2014) Effects of aging, cytomegalovirus infection, and EBV infection on human B cell repertoires. J. Immunol., 192, 603–11.
    dbGAP Accession: phs000666.v1.p1

  22. Wang,C. et al. (2014) B-cell repertoire responses to varicella-zoster vaccination in human identical twins. Proc. Natl. Acad. Sci. U. S. A.
    dbGAP Accession: phs000817.v1.p1

  23. Warren,R.L. et al. (2011) Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res., 21, 790–7.
    SRA Accession: SRA020989

  24. Weinstein, J. A., Jiang, N., White, R. A., Fisher, D. S. & Quake, S. R. High-throughput sequencing of the zebrafish antibody repertoire. Science (80-. ). 324, 807–10 (2009).
    SRA Accession: SRA008134

  25. Wesemann, D. R. et al. Microbial colonization influences early B-lineage development in the gut lamina propria. Nature 501, 112–5 (2013).
    BioProject Accession: PRJNA212030

  26. Wu, X. et al. Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Science (80-. ). 333, 1593–602 (2011).
    SRA Accession: SRP006992

  27. Wu, Y.-C. B. et al. Influence of seasonal exposure to grass pollen on local and peripheral blood IgE repertoires in patients with allergic rhinitis. J. Allergy Clin. Immunol. 134, 604–612 (2014).
    SRA Accession: SRP038092

  28. Zhu, J. et al. Mining the antibodyome for HIV-1-neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains. Proc. Natl. Acad. Sci. U. S. A. 110, 6470–5 (2013).
    SRA Accession: SRP018335

  29. Zvyagin, I. V et al. Distinctive properties of identical twins’ TCR repertoires revealed by high-throughput sequencing. Proc. Natl. Acad. Sci. U. S. A. 111, 5980–5 (2014).
    SRA Accession: SRP028752


That’s great! I’ll take a look and work up some metadata to go along with these.


Here are my two cents:

Metadata and processing results are here.
Another ~40 samples with “extreme” age cases are on their way to be published, hopefully I’ll be able to update the list with them soon.


Thanks! Much appreciated. Probably cost more than 2c though :grin:


@javh and @mikhail.shugay – thank you very much for this list. Very helpful!

Jason – have you run PRESTO on some of your list? If so, any chance you’d be willing to share your preprocessing scripts like Mike has?


I’ve run a few through, but I haven’t made a serious effort. Most of them appear to be minor variations on one of these three workflows:

UMI barcoded MiSeq

I’m planning to run all the BCR data sets through at some point, and I will certainly share my pipelines when I do, but it’s more of a long-term goal. If there is a specific one you are interested in, just shoot me an email and I’ll figure out the details.


As promised, a link to the complete PBMC T-cell receptor beta sequencing dataset (73 samples, some come in replicates) for our aging study: PRJNA316572.

  1. Ruggiero, E. et al. High-resolution analysis of the human T-cell receptor repertoire. Nature Communications 6, 8081 (2015).
    BioProject Accession:PRJNA287162