Dynamic Kernel Matching for predictive modelling on T-cell receptors and T-cell receptor repertoires

jostmey · February 25, 2020, 5:43pm

Most predictive models assume the features are arranged into rows and columns, like a spreadsheet, but many kinds of data do not conform to this structure. Sequences are one example of a different kind of data, which is why this data is usually stored in a text document, not a spreadsheet. To build predictive models for sequences and other non-conforming features, we have developed what we call dynamic kernel matching (DKM).

We apply DKM to two datasets of T-cell receptors.

The first dataset is what we call the antigen classification problem. From sequenced T-cell receptors, we predictively model which of six disease antigens the T-cell receptor binds.
The second dataset is what we call the repertoire classification problem. From sequenced T-cell repertoires, we predictively model CMV serostatus. This model is built without knowledge of the CMV specific T-cell receptors.

The full results are in a manuscript we’ve submitted for peer-review, but I thought this community may want the code, which is at GitHub - jostmey/dkm: Dynamic Kernel Matching (DKM) for Classifying Data with Non-conforming Features.

Cheers

hkoohy · March 2, 2020, 8:13am

Hi Jostmey,
This looks very interesting with a high number of potential applications in the field of computational immunology.