The Human Germline Genes subgroup has proposed guidelines which have some bearing on the schema:
- Sequences must be reported in a peer reviewed journal.
- Sequences must not include ambiguities
- A single cDNA derived sequence cannot be the sole evidence for a germline sequence
- The database must include full-length sequences.
- We exclude all sequences generated by six studies that have high sequencing error rates.
For (1) I have included a list of citations in the gene record. I’ve also added a list of citations for the gene set itself.
For (2) and (4), the schema is more permissive at the moment and we can consider whether it should be tightened it up to impose these restrictions. My inclination, at the moment, is not to do so. The same guidelines might not be applicable to other species, where the germline is not so well described. In any case, I think the discrimination should be made by the curators of the gene set: for example, as discussed in that post, there is sometimes uncertainty over the final nucleotides of the V-gene. Also, the schema might be used in other contexts where the restrictions are not imposed. For the same reason, I have made the citations fields optional rather than mandatory.
As always, please let me know whether you support this approach or have a different view.
Thanks
William