A standardized file format for BCR clonal families?


There are a few core analyses that people do on repertoire sequence data. One is identifying germline genes and junctional boundaries, and there’s been a lot of thought about this by the VDJML team and a thread on this forum.

For BCR sequences, another core analysis is clustering sequences by naive rearrangement event, which are then variously called “clones”, “clonal families”, “lineages”, etc.

I’m not aware at any effort at standardization of a file format for those sorts of inferences. I didn’t see anything in the VDJML spec, but perhaps I’m wrong, @lindsay.cowell ?


Thanks Eric. We have talked about adding an annotation to each rearrangement for storing clone identifiers along with annotation about how the clone was identified. Regarding how the clone was identified, we were just thinking to include information similar to what we have for gremlin alignment where we track the software used. We hadn’t thought to include anything beyond that though.

It seems important to have that information with each sequence, but at the same time I could imagine a separate file format that has clone as the central entity rather than the individual read.