W Zhang, IM Wang, C Wang, L Lin, X Chai, J Wu, AJ Bett, G Dhanasekaran, DR Casimiro and X Liu,
Frontiers in immunology, 2016
Large-scale study of the properties of T-cell receptor (TCR) and B-cell receptor (BCR) repertoires through next-generation sequencing is providing excellent insights into the understanding of adaptive immune responses. Variable(Diversity)Joining [V(D)J] germline genes and alleles must be characterized in detail to facilitate repertoire analyses. However, most species do not have well-characterized TCR/BCR germline genes because of their high homology. Also, more germline alleles are required for humans and other species, which limits the capacity for studying immune repertoires. Herein, we developed "Immune Germline Prediction" (IMPre), a tool for predicting germline V/J genes and alleles using deep-sequencing data derived from TCR/BCR repertoires. We developed a new algorithm, "Seed_Clust," for clustering, produced a multiway tree for assembly and optimized the sequence according to the characteristics of rearrangement. We trained IMPre on human samples of T-cell receptor beta (TRB) and immunoglobulin heavy chain and then tested it on additional human samples. Accuracy of 97.7, 100, 92.9, and 100% was obtained for TRBV, TRBJ, IGHV, and IGHJ, respectively. Analyses of subsampling performance for these samples showed IMPre to be robust using different data quantities. Subsequently, IMPre was tested on samples from rhesus monkeys and human long sequences: the highly accurate results demonstrated IMPre to be stable with animal and multiple data types. With rapid accumulation of high-throughput sequence data for TCR and BCR repertoires, IMPre can be applied broadly for obtaining novel genes and a large number of novel alleles. IMPre is available at https://github.com/zhangwei2015/IMPre.