Hello Jian, I think that I've understood a basic difference in perspective between us.
Our perspective is simply that sequences sit in repertoires, and that it's most powerful to understand them in this way. We can use characteristics of the entire repertoire to tell us about each individual sequence. As a simple example, if we have a large repertoire sample and only one sequence is inferred to use a given germline V gene, that inference is probably incorrect: it would be very improbable for there to be only one rearranged receptor sequence using that gene. Most modern applications of immune receptor computational tools are to an entire sample at a time or more, so leveraging properties of the whole repertoires can be useful in practice.
So, if we are going to be testing methods that use whole-repertoire information to say something about each individual sequence, we need to be simulating whole repertoires. Because the result of such a simulation is a function of all of the sequences in the sample, it becomes difficult to disentangle the contribution to accuracy of each allele in the simulated repertoire. For example, simulating using two similar alleles will make for a more difficult problem than simulating from two very different ones. If we want to score a per-allele performance, do we do so in a simulated repertoire with other such "distracting" alleles or not?
There is also a more practical reason: at the end of the day people are going to want one or a few numbers summarizing performance. If we score performance on each allele individually, that's hundreds of V alleles, but then we will need to test those in combination with the various D and J alleles. The number of combinations is quite large (of course, that's the point ) and too much for us to really digest. This motivates summaries across all these trials. I would argue that the most natural summary would be an average performance weighted by the frequency of occurrence of the various alleles, and we are back to looking at the allele frequency again! By simulating from repertoires with these weights built in we don't have to try all combinations or do any such post-summarization-- it comes for free.
It is true that there aren't many individuals that have been genotyped by direct sequencing of their unrearranged germline genes. However, there are now many tools that can be used to infer genotypes directly from rearranged sequences. Although these tools are not perfect, I think that we can all agree that a whole-repertoire simulation based on germline inference using these tools is going to be a lot more realistic than simulating a repertoire using all of the alleles in IMGT together. No single individual has ever had all of these alleles.
Thanks for thinking about this with me. Any other voices out there? @javh, @a.collins, @laserson?