Dynamic Kernel Matching for predictive modelling on T-cell receptors and T-cell receptor repertoires

Most predictive models assume the features are arranged into rows and columns, like a spreadsheet, but many kinds of data do not conform to this structure. Sequences are one example of a different kind of data, which is why this data is usually stored in a text document, not a spreadsheet. To build predictive models for sequences and other non-conforming features, we have developed what we call dynamic kernel matching (DKM).

We apply DKM to two datasets of T-cell receptors.

  1. The first dataset is what we call the antigen classification problem. From sequenced T-cell receptors, we predictively model which of six disease antigens the T-cell receptor binds.
  2. The second dataset is what we call the repertoire classification problem. From sequenced T-cell repertoires, we predictively model CMV serostatus. This model is built without knowledge of the CMV specific T-cell receptors.

The full results are in a manuscript we’ve submitted for peer-review, but I thought this community may want the code, which is at https://github.com/jostmey/dkm.



Hi Jostmey,
This looks very interesting with a high number of potential applications in the field of computational immunology.

1 Like