Thanks to @javh for the list of publically available HTS data. Most of these are sequenced from mRNA, so are unlikely to capture the nonproductive repertoire.
However, some used gDNA, allowing the nonproductive repertoire to be captured as well:
Boyd, S. D. et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci. Transl. Med. 1, 12ra23 (2009). SRA Accession: SRP001460
Jiang,N. et al. (2013) Lineage structure of the human antibody repertoire in response to influenza vaccination. Sci. Transl. Med., 5, 171ra19. SRA Accession: SRA058972
Ohm-Laursen and Barington (2007) Analysis of 6912 unselected somatic hypermutations in human VDJ rearrangements reveals lack of strand specificity and correlation between phase II substitution rates and distance to the nearest 3’ activation-induced cytidine deaminase target. J Immunol. 178(7):4322-34. EMBL AM076988–AM083316
However, the first two use VH-internal primers (FR1 or FR2) and the last only looks at a single V gene (VH3-23). There’s also a really nice paper looking directly at the nonproductive allele in mice:
In addition to what @caschramm has listed, these are all the public BCR data sets sequenced from gDNA that I’m aware of:
Jackson,K.J.L. et al. (2014) Human Responses to Influenza Vaccination Show Seroconversion Signatures and Convergent Antibody Rearrangements. Cell Host Microbe, 105–114.
dbGaP Accession: phs000760.v1.p1
Roche 454, gDNA
Kaplinsky, J. et al. Antibody repertoire deep sequencing reveals antigen-independent selection in maturing B cells. Proc. Natl. Acad. Sci. 111, E2622–E2629 (2014).
BioProject Accession: PRJNA248676
Illumina MiSeq 2x150, gDNA
Michaeli, M. et al. Immunoglobulin gene repertoire diversification and selection in the stomach - from gastritis to gastric lymphomas. Front. Immunol. 5, 1–14 (2014).
BioProject Accession: PRJNA206548
Roche 454, gDNA
Parameswaran,P. et al. (2013) Convergent Antibody Signatures in Human Dengue. Cell Host Microbe, 13, 691–700.
BioProject Accession: PRJNA205206
Roche 454, gDNA
Roskin, K. M. et al. IgH sequences in common variable immune deficiency reveal altered B cell development and selection. Sci. Transl. Med. 7, 302ra135–302ra135 (2015).
dbGap Accession: phs000934.v1.p1
Illumina MiSeq and Roche 454, mRNA and gDNA
Wang,C. et al. (2014) Effects of aging, cytomegalovirus infection, and EBV infection on human B cell repertoires. J. Immunol., 192, 603–11.
dbGAP Accession: phs000666.v1.p1
Roche 454, gDNA/mRNA
Wang,C. et al. (2014) B-cell repertoire responses to varicella-zoster vaccination in human identical twins. Proc. Natl. Acad. Sci. U. S. A.
Brief comment on this: In general nonsense-mediated mRNA decay (NMD) should destabilize Ig/TCR transcripts with out-of-frame rearrangements. However, the constant regions of IgK and IgL are only encoded by a single exon, hence these loci should not be affected as the stop codon will often only arise in the last exon (not in the J segment). Looking at our single-cell data (mouse) we typically find 25-30% of the Igk or Igl transcripts to be out-of-frame, while the number for Igh is around 15-20%. I never did the statistics whether the difference is significant, but my main point here would be, that NMD seems to be less effective than one would assume.
This is interesting to me… for our human bulk mRNA preps, we get 3-5% of reads for which V and J can both be assigned have out-of-frame junctions and another 3-5% have stop codons. This looks pretty consistent between heavy and light chains, though I haven’t checked systematically. I wonder if this is a function of the species or of the prep…
I wonder if this may be useful to break down by species… maybe the list can be reposted in a wiki somewhere so that we can help you annotate and update it?
Also, does anyone have experience with accessing data from dbGaP? We don’t otherwise do anything that requires IRB approval, so I don’t even know where to start…
That’s a good idea, @caschramm. It was sort of weird to attach it to the immcantation repo anyway, as that has our new lab member guide (also known as “6 pages of me being snarky”).
Ideally, it should really be a database, so we can sort/search on useful information and cross-reference to publications that have reused the data. I think at a minimum:
Species
Receptor & chain/class
Template (RNA/DNA)
Primer positions (5’RACE, FWR1, CH1, etc)
UMI use and length
Sequencing platform and read length
Not sure what the best/easiest platform for this would be.
Those are good suggestions @caschramm and @ematsen. I’m inclined to think the csv plus sortable table is the better approach for now. It’s not a huge list of papers (yet).
I checked again on our human data sets and for most of them the numbers were in the same range (15-25%). It is probably important to note that the amplification process during single-cell PCR typically runs into saturation and is not quantitative due to potential primer bias (sorry, no UMIs at this point). Thus scPCR will likely not capture the quantitative differences between between productive and non-productive transcripts.
There are however two data sets that show substantially lower numbers (5-10%), but they also differ in the sampled cell populations. Going back to all data sets, the general trend seems to be that populations with high transcriptional activity (e.g. plasma cells) have lower percentages of non-productive transcripts.
are you aware of this resource: A public database of memory and naive B-cell receptor sequences
PLOS ONE 08/11/2016 11(8):e0160853 - the underlying data can be found in Adaptive’s immuneAcess database.