Right now I’m mining published studies to extract TCR sequences with known HLA/antigen specificity.
I’ve organized everything using a quite straightforward architecture (see this repository):
- All data is stored in a tab-delimited format, arguably the easiest to operate manually
- Papers that potentially contain annotated TCR sequences stored as issues, submission managed via pull requests (I plan to implement a simple form for generating them), CI checks for consistency
Please share your thoughts:
- Do you think such database will be a useful resource for the community?
- Do you know of any similar initiatives?
Any comments on the design would be most appreciated.