Standardizing the format of a germline set

w.lees · October 15, 2016, 10:12am

I have made some small updates to the schema definition in response to comments posted in the document:

ORCID and PubMed ID added to author name and citation respectively, where these items exist.
field labels 5_UTR and LEADIN changed to 5’UTR and L-REGION to match the labels used in the IMGT Ontology, As I noted in the document, I suggest that it is worth using the ontology as a starting point, as the terms are defined and quite widely used. Over time we can adapt and extend the definitions as necessary. The full set of labels is at http://www.imgt.org/ligmdb/label and background/citations are at http://www.imgt.org/IMGTindex/ontology.php .

Does anyone have views on the adoption of the IMGT labels as a starting point?

@a.collins, With respect to the comment in your last mail, the schema can accommodate any other labels we need in addition to the ones we already have (5’UTR, L-REGION, FR1, CDR1, FR2, CDR2, FR3, CDR3, FR4), provided that they are annotations which occur at most once per sequence. If we wanted to add annotations, for example to label hotspots, that can occur multiple times in a sequence, we’d need to construct something else. That’s certainly do-able, but is not going to be easy to read or manage in a spreadsheet-type form.

Are there other labels in addition to the ones listed that we should add at this stage?