Benchmarking TCR specificity algorithms is of course the ultimate goal. Doing that via crossvalidation assumes the algorithm is not specificity prediction, but TCR clustering one. As for a prediction algorithm, I think running it against the database and TCR:ag permutations will be the first step.
However right now the goals are far more humble: doing some basic meta-analysis and annotating RepSeq samples. The later is extremely exciting as it is a new dimension of analysis that can be applied to published RepSeq studies, just imagine having all RNA-Seq papers with no GO enrichment analysis performed.
Contributing new sequences will be quite easy once a web interface for submissions is done, github interface is not that biologist-friendly. Now I'm focused on previously published papers as there is a ton of information, even "public" clonotype studies are the tip of the iceberg. The problem is the great diversity in reporting style, needless to say some papers require running image processing software.