Hackathon Project | TCR Repertoire Analysis
T-cell receptor (TCR) repertoires from cancer patients contain valuable signals — but identifying which TCRs are "public" (shared across individuals and described in curated databases) requires tools that are tightly coupled to a single database. TCRMatch, for example, only queries IEDB. Researchers working with VDJdb, McPAS-TCR, or custom epitope databases have no unified solution.
Public.Match is a generalized TCR public-sequence matching tool that:
- Accepts patient TCR repertoires (CDR3α/β sequences) as input
- Searches across multiple curated databases — IEDB, VDJdb, McPAS-TCR, 10x Genomics pMHC, and others
- Returns matched public TCRs with epitope annotations, HLA restrictions, and match scores
- Is built with Claude Code, using AI-assisted development to rapidly prototype and extend the tool during the hackathon
Think of it as TCRMatch — but database-agnostic.
| Today | With Public.Match |
|---|---|
| Run TCRMatch → IEDB only | Single query → IEDB + VDJdb + McPAS + 10x |
| Manual format conversion per DB | Unified input/output schema |
| No cross-database deduplication | Merged, ranked hits across all sources |
Identifying public TCRs that recognize known epitopes helps distinguish antigen-specific from bystander T cells — a key step in spatial immunology pipelines like our own Soleil engagement scoring framework.
- Unify database schemas — normalize all sources into a common
cdr3_alpha / cdr3_beta / epitope / mhc / v_gene / j_gene / sourceformat - Extend or wrap TCRMatch — reuse its edit-distance / GLIPH2-inspired scoring logic against any database
- Build a CLI —
public-match --input repertoire.tsv --db iedb vdjdb mcpas 10x --score 0.97 - Claude Code as co-developer — use Claude Code to accelerate implementation, handle format parsing edge cases, and generate test cases
| Database | Entries | CDR3α | CDR3β | Epitope | HLA | Folder |
|---|---|---|---|---|---|---|
| IEDB | 226,280 TCR records | ✓ | ✓ | ✓ | ✓ | Databases/IEDB/ |
| VDJdb | 145,408 chain records | ✓ | ✓ | ✓ | ✓ | Databases/VDJdb/ |
| McPAS-TCR | 40,779 | ✓ | ✓ | ✓ | ✓ | Databases/McPAS/ |
| 10x Genomics pMHC | 189,515 cells / 4 donors | ✓ | ✓ | ✓ (55 pMHC) | ✓ | Databases/10xDcode/ |
| MixTCRpred | 17,715 αβ pairs | ✓ | ✓ | ✓ (146 pMHC) | ✓ | Databases/MixTCRpred/ |
| BATCAVE | 24,875 TCR–peptide measurements | ✓ | ✓ | ✓ (mutational scan) | ✓ | Databases/BATCAVE/ |
| OTS | 1.63M non-redundant paired αβ | ✓ | ✓ | — (publicness only) | — | Databases/OTS/ (manual download) |
All databases are mapped to a common record format:
cdr3_alpha CDR3 amino acid sequence of the alpha chain
cdr3_beta CDR3 amino acid sequence of the beta chain
epitope Epitope peptide sequence
mhc MHC/HLA restriction (e.g. HLA-A*02:01)
v_alpha TRAV gene
j_alpha TRAJ gene
v_beta TRBV gene
j_beta TRBJ gene
source Database of origin (IEDB / VDJdb / McPAS / 10x)
| Database | Entries | Notes |
|---|---|---|
| TCRdb 2.0 | ~700M sequences | Broad clinical coverage; no epitope labels |
| STCRDab | ~1,000 | 3D structural data from PDB |
| PIRD | large | Pan Immune Repertoire Database; China National GeneBank |
| ePytope-TCR datasets | 21 datasets / 762 epitopes | 2025 benchmarking collection on Zenodo |
git clone https://github.com/Marcus-Mendes/Public.Match
cd Public.Match
conda env create -f environment.yml
conda activate public-match
pip install -e .python -m public_match --input sequences.fastapython -m public_match --input alpha_seqs.fasta --chain alphapython -m public_match --input beta.fasta --input-alpha alpha.fasta --chain pairedpython -m public_match --input repertoire.tsv --chain paired
# explicit column names if auto-detection fails:
python -m public_match --input repertoire.tsv --chain paired \
--seq-col cdr3_beta --alpha-col cdr3_alphapython -m public_match --input sequences.fasta --db iedb vdjdb mcpasAvailable databases: iedb, vdjdb, mcpas, 10x, mixtcrpred, batcave, ots
# BLOSUM62-normalised score (default, 0–1; 0.97 = near-exact match)
python -m public_match --input sequences.fasta --method blosum --threshold 0.97
# Edit distance (integer; 1 = one substitution allowed)
python -m public_match --input sequences.fasta --method edit --threshold 1
# Exact match only
python -m public_match --input sequences.fasta --method exactpython -m public_match --input sequences.fasta --custom-db my_db.csv
# specify the CDR3β column if it differs from the defaults:
python -m public_match --input sequences.fasta --custom-db my_db.tsv \
--custom-db-cdr3-col junction_aaResults are written to public_match_results.csv by default. Use --output to change the path:
python -m public_match --input sequences.fasta --output results/my_run.csvOutput columns: query_name, query_cdr3b (and query_cdr3a for paired mode), cdr3_alpha, cdr3_beta, epitope, mhc, v_alpha, j_alpha, v_beta, j_beta, source, score.
python -m public_match --help
--input/-i PATH Input file: FASTA or tabular (TSV/CSV/AIRR)
--output/-o PATH Output CSV (default: public_match_results.csv)
--db DB [DB ...] Databases to search (default: all)
--method {blosum,edit,exact} Matching method (default: blosum)
--threshold FLOAT Score threshold (default: 0.97)
--chain {beta,alpha,paired} Chain mode (default: beta)
--seq-col COL CDR3β column in tabular input
--alpha-col COL CDR3α column in tabular input
--name-col COL ID column in tabular input
--input-alpha PATH CDR3α FASTA for paired mode
--custom-db PATH [PATH ...] Custom database file(s)
--custom-db-cdr3-col COL CDR3β column in custom DB
A working CLI prototype that takes a CDR3 repertoire file and returns matched public TCRs from all four databases, with a unified output format and match score.
Built at Hackathon · 2026 · with Claude Code