Can I use MiXCR on incomplete IgH BCR?


#1

Hi all,

I am new to BCR seq. and we are going to use multiplex PCR protocol to detect incomplete IgH(D-J) and by Illumina Hiseq 150*2 in B-ALL patients. Since we are amplifying BCR (DNA) without V segment, can I still use MiXCR to process the data? If yes, which parameters I must use in the alignment?

Previously I just have used 5RACE method and RNA as input for the BCR seq.

Thank you very much!


#2

MIXCR can fetch data from incomplete reads and assemble clonotypes, for example in case of RNA-Seq. However the assumption is that you deal with “canonical” rearrangements and all trimmed/missing V-segments are due to read coverage. The problem in your case is that MIXCR will only align D segment if both V and J were found, as D segment alignments are tricky. So for IGHD-IGHJ the answer is no. You can make a feature request at MIXCR/Issues.

As for your dataset, I can recommend Vidjil software that is optimized for handling incomplete rearrangements and cancer data, the list of supported events and loci is here.


#3

You may not be interested in different softwares, but getting partis to work on a sample that was about 2/3 DJ-only rearrangements ended up just entailing adding whatever’s 5’ of D in the DJ-only sequences as new germline V genes to the germline V file. Happy to send details if you’re interested.


#4

Thank you very much!


#5

yes please, I would love to learn more about the software.


#6

The manual is hopefully self-contained, and it has links to the two papers.

For the DJ-only sample, I forget exactly how I got the “intronic” V genes, but it was something obvious/simple like sort/uniq’ing through whatever occurred to the left of each D gene. The end result was this germline set directory, where a V gene with a name like IGHVxDx1-101 is, as you’d imagine, the intronic sequence that occurred to the 5’ side of IGHD1-101. These are, uh, presumably universal for humans? Others are probably better qualified than I to comment on that. Use the --initial-germline-dir argument (see --help for details) to use this directory instead of the default.


#7

Thanks psathyrella.
I will read the manual and try the software.

Best.


#9

The link to the GL set directory seems to be broken/expired. Do you have an alternative URL?


#10

oops, yeah, I deleted it.

here, this won’t go away: https://github.com/psathyrella/intronic-germlines

(the intronic ones are at the bottom of the v file). Let me know if you have any issues running with it.


#11

Thanks @psathyrella for the new link.

Yes, there will be some minor differences but you can nicely see the RSS12 sites at the 3’ end of each of your VxDx sequences. However, I noticed that when aligning the sequences versus the genome, the last base of the RSS12 (usually a “G”, at the transition to the actual D segment) was neither included in the D nor the VxDx list. Is it considered an N nucleotide by the algorithm?


#12

hm, yes, it is probably considering it an N nucleotide. Not for any particular reason – I generated these by grep/sed’ing by hand, and probably missed a base there. It sounds like it would make sense to have the full thing, so maybe you could submit a pull request on that repo, if you have the corrected versions?


#13

Ok, done. If necessary, we can discuss the further details on GitHub.