Looking at some of the existing schemas for VDJ rearrangement data (Change-O, VDJML, etc.), it seems that many fields are “derived” fields, in which they could be computed from some smaller core set of fields For example, if we have the coordinates of the start of CDR3 and the end of CDR3, we could compute the junction length. To what extent do we want to incorporate derived fields? Should we also mandate exactly how they are to be computed from the core fields?
On a related note, most of the fields can be computed from annotations that are “lifted over” from the alignment to an associated germline sequence. Specifically, this is the locations of the boundaries of all the FWRs and CDRs. Are these few data points in fact sufficient to cover our needs? Is the germline group on board with ensuring that a “valid” germline set includes all the annotations we need?