Sponsored by the AIRR Community

How do we understand those pseudogene? such as, IGHVII, IGHVIII, and IGHVIV, etc

First, dose Mixcr cover IGHVII, IGHVIII, and IGHVIV analysis ability?
I didn’t see those sequences on IMGT but only in NCBI. Why IMGT doesn’t care about those sets of IGHV?

As long as your aligner accepts custom reference libraries, this is mainly a question of the library, not of the aligner. MiXCR does accept external libraries, as described at RTD.

IMGT maintains multiple libraries for various purposes, thus when using their tools, you need to make sure that you select the correct set of reference sequences, as pseudogenes are by default only included if they are in-frame.

Further note that the the IGHVII, IGHVIII and IGHVIV designations are problematic as they are at variance with current HUGO guidelines [DOI:10.1038/s41588-020-0669-3], which recommend not to use Roman numerals in gene symbols (i.e., the “I” and “V” characters after “IGHV”). These numerals indicate the “clan” of a pseudogene, in case a “subgroup” cannot be assigned with the necessary certainty. Internally, IMGT uses designations in which the numeral is flanked by parentheses, but as there are forbidden in official gene symbols, they are simply discarded upon conversion. It is clearly a confusing system, but as these genes are only rarely identified, the problems are manageable.

Thanks, bussec.
Very helpful!

I’m still wondering if IGHVII,III,IV will be involved in the IGH rearrangement or will the pseudogene family clones generate CDR3 region? I have never seen any V segment like IGHVII,III,IV in our final data. So I’m just curious about how important the pseudogene families are.