The google thing works indeed, see Exmaple 6 from our VDJviz paper for quite an interesting case.
The thing with public clonotypes is that clonotypes can be rare and frequent in the populations. Ignoring individual disease history there are two factors influencing it:
The probability of VDJ rearrangement, see Elhanati 2014. Namely, you are far likely to find a clonotype with a single added N-nucleotide in V-J than in with 10 N-nucleotides.
The probability of passing thymic selection and proliferation in the peripheral blood. These are far more complex. For thymic selection see Košmrlj 2008, high fraction of strongly-interacting amino acids in CDRs lowers the probability to pass the selection.
Computing 1. and 2. yield results that are in a good agreement with just getting a big dataset of RepSeq samples and computing the incidence for each clonotype, so there are not much problems with getting public sequences in general.
Note that public clonotypes make an ideal "reference set" for comparing samples: overlapping them across various samples will get you a incidence dense matrix that is far easier to handle than.
The things that I believe are of interest here:
- Donor MHC - the absence/presence of certain public clonotypes should be a good predictor of MHC haplotype
- Disease associated public clonotypes - see http://friedmanlab.weizmann.ac.il/McPAS-TCR/
- Tissue specificity - the database you've referenced
Right now none of these is published, so right now the easiest way to make something public is to share it on github in a plain text so that google can index it