I would like to make general comments (again)
I suggest that one entry in the gremline set should contain the gremline sequence of a variable V gene segment on top
together with
cross references to entries in other databases
evidence
based on data present in public databases
(may be add a possibility to add own data not submitted to public databases)
It should link to the gremline region if the genome location is known
(may be include the germline sequence of a window of the gene including the promotor region, when I did sequence analyses, most of the time a region of about 800bp per V gene segment would be sufficient).
Should point to rearranged examples of the gremline sequence if it is a functional gremline sequence
What information should be added to rearranged examples? Here is now the chance to also add information on antigen specificity, combinations of both V gene segments when known etc. Such data would allow in the future to create maps which V gene segments are used in which antigen specific responses. In case of TCR segments, information on MHC restriction and peptide specificities (s)could also be added.
Big question, should somatically mutated variants also added?
Metadata should be added (like type of variable gene segment, species etc as discussed)
Important is that
the entries are crosschecked to avoid duplications
a condensed format gremline V gene database is available only containing the unique name and the sequence with a link to the complete entry is available.
Ideally subsets specific for particular types of variable gene segments and or for particular species should be generated as well.
one more comment:
At IMGT we discussed early in the project if we should start a specialised public database which would accept antibody and T cell receptor sequences on behalf of the big databases (GENEBANK, EMBL) and start a separate public database, moving rearranged sequences from the general databases to the specialised database and create special entries with gremline sequences to the big databases. This idea did not get anywhere, may be could be discussed again.
For HLA sequences the scientific community agreed that all HLA sequences are submitted to a specialised HLA sequence database where also the sequences are checked and named and only then publicly available. For TCR and antibody genes, such a system was never installed.