How do we understand those pseudogene? such as, IGHVII, IGHVIII, and IGHVIV, etc

Tao_Sun · December 28, 2020, 8:14am

First, dose Mixcr cover IGHVII, IGHVIII, and IGHVIV analysis ability?
I didn’t see those sequences on IMGT but only in NCBI. Why IMGT doesn’t care about those sets of IGHV?

bussec · January 4, 2021, 5:22pm

As long as your aligner accepts custom reference libraries, this is mainly a question of the library, not of the aligner. MiXCR does accept external libraries, as described at RTD.

IMGT maintains multiple libraries for various purposes, thus when using their tools, you need to make sure that you select the correct set of reference sequences, as pseudogenes are by default only included if they are in-frame.

Further note that the the IGHVII, IGHVIII and IGHVIV designations are problematic as they are at variance with current HUGO guidelines [DOI:10.1038/s41588-020-0669-3], which recommend not to use Roman numerals in gene symbols (i.e., the “I” and “V” characters after “IGHV”). These numerals indicate the “clan” of a pseudogene, in case a “subgroup” cannot be assigned with the necessary certainty. Internally, IMGT uses designations in which the numeral is flanked by parentheses, but as there are forbidden in official gene symbols, they are simply discarded upon conversion. It is clearly a confusing system, but as these genes are only rarely identified, the problems are manageable.

Tao_Sun · January 6, 2021, 3:27am

Thanks, bussec.
Very helpful!

I’m still wondering if IGHVII,III,IV will be involved in the IGH rearrangement or will the pseudogene family clones generate CDR3 region? I have never seen any V segment like IGHVII,III,IV in our final data. So I’m just curious about how important the pseudogene families are.