Looks like I originally misunderstood what the columns
JStart in VDJTools were. These fields are relative to the junction region, and not relative to the full length sequence, as I assumed. Meaning, the above conversions are wrong.
Also, in VDJTools the CDR3 refers to the junction region, not the IMGT CDR3 definition, so it includes the conserved residues flanking the IMGT CDR3. Hence, it's two codons longer than the
CDR3_IMGT column in Change-O. However, you can just map the
JUNCTION column to the
cdr3nt column to resolve this.
Here's the conversion that seems to work. There are a few discrepancies in the conversion using the files @sdwfrost posted, but we think those are probably just D-segment alignment differences between MiGMAP and vanilla IgBLAST.
VEnd = V_GERM_LENGTH_IMGT - 312 + 3
DStart = VEnd + NP1_LENGTH
DEnd = DStart + D_GERM_LENGTH
JStart = DEnd + NP2_LENGTH
I posted a script to do the conversion on the Immcantation repo. Use like so:
> changeo2vdjtools.R file_db-pass.tsv
I didn't do careful testing, so if you hit any snags let me know.
I'll make an issue on the Change-O repo about adding this alignment info. There's probably a way we can incorporate it as optional output without introducing too much clutter from redundant info.