Dear all,
I was wondering if anybody in here have experience in doing phylogeny on BCR data? Specifically creating a tree view of the somatic hyper mutation process of a set of sequences having the same clonotype. I am currently giving it a shot by first partitioning my NGS sequences into clonal families (thanks to partis) and then using the derived naive sequence and the VH sequences from my NGS data as input in the phylogenetic analysis. Notice that since these partitioned sequences have same clonality they should also have the same length and therefore no need to worry about multiple sequence alignments.
This I then use as input to MrBayes where is split the codon into three partitions, one for each position in the codon, which will get its own model parameters. Then I set a broadly covering substitution model with 6 variables and run lots of iterations until convergence. I have also considered doing it on amino acid level and then use the substitution model appropriate for this.
I think it can be argued that the DNA level information is important in terms of recreating the right phylogenetic picture of the shm process, but what really drives the selection pressure is the amino acid identities on some key positions and therefore I fell that there are some important information that are left out when running phylogeny on DNA level.
I would like to hear the opinion/experience from anybody that have tried working on this problem. Are there a better method that I am unaware of (there very likely is…), am I doing anythings wrong etc. Also if/when there is a good solution to this practical problem these experiences should be written down for others to replicate them, something which I will be committed to do if I find the magic formula.