I’ve observed variants with mutated C/F/W in many of our datasets which are too abundant to be erroneous.
For example referring to our old data (MIGEC paper):
There is a rank 3 clonotype supported by 0.5mln reads out of 37mln (1.5% freq) with no similar variant having higher frequency:
IGHV3-74 - CARDFRAGVPAEYS - IGHJ4
Its hard to believe it is an error as it both has high frequency and we’ve used UMIs to correct errors.
Here is igblast mapping for it:
The variant apper to have some hypermutated subvariants:
So it is likely to be something biologically functional.
Another example for conserved Cys (this has a far lower frequency, 0.02%):
Thus, if a full-length sequence of an IG chain is known I would define functional antibodies as those that do not have stop codons when translated. As for incomplete reads we can rule out cases with a stop codon and a frameshift in CDR3, frameshift indels in V, as well as with V pseudogenes from the functional class.