Thanks for sparking off such an interesting discussion!
A couple of simple observations on your mails which might help others, or help me if my thinking is wrong - I am not an expert on phylogeny and really appreciate the insights in this thread.
therefore no need to worry about multiple sequence alignments
In practice I find that the sequences I deal with often don't start at exactly the same nucleotide and sometimes have differing lengths, in the latter case, most likely because of sequencing errors. I find it easiest to start with an MSA and then remove any sequences that are grossly out of alignment. The approach I find works best is codon-aligned nucleotide alignment. (you align the aas, and then revert back to the underlying nucleotide sequences once the codon alignment is complete). To me this makes sense from a biological point of view and in practice it eliminates some frustrating issues I've had with alignments at the nucleotide level, particularly in rabbit sequences, which feature a certain amount of gene conversion. TranslatorX (www.translatorx.com) does the codon alignment nicely, working with a choice of a number of aligners.
just more NGS errors
I also have to work with datasets that do not include UMIs. Although these are Illumina sets, I tend to cluster the nucleotide sequences at around 97% using UCLUST (drive5.com) with the intention of removing the noisy reads that creep in from oversampling. While there is a danger that this will remove useful sequences that would help infer an accurate phylogeny, my intuition is that it removes a lot more noise. But I;d be interested to hear what other people think about this. Another approach would be to remove singletons. At the moment, the particular group I have been working with has not optimised read coverage against B-cell count, we are working in this direction but there are some practical issues, and I think we would lose just far too much detail if we eliminated singletons.
only the synonymous mutations will be neutral
In our studies, the variation we see in a clonal family after two or three trial immunizations tends to feature quite a few nonsynonymous mutations that are neutral in terms of binding affinity. I think this is because there are typically a number of positions in the CDRs that are under neutral selection, at least for some possible substitutions, so what you tend to see is the family 'cycling' between these, if you will, until, eventually, a fitter strain emerges. And even in the case where the antigen is not evolving, there may well be some natural variation that encourages such minor variation in the receptor - malformed sequences and so on.