Public.Match

Hackathon Project | TCR Repertoire Analysis

The Problem

T-cell receptor (TCR) repertoires from cancer patients contain valuable signals — but identifying which TCRs are "public" (shared across individuals and described in curated databases) requires tools that are tightly coupled to a single database. TCRMatch, for example, only queries IEDB. Researchers working with VDJdb, McPAS-TCR, or custom epitope databases have no unified solution.

Our Idea

Public.Match is a generalized TCR public-sequence matching tool that:

Accepts patient TCR repertoires (CDR3α/β sequences) as input
Searches across multiple curated databases — IEDB, VDJdb, McPAS-TCR, 10x Genomics pMHC, and others
Returns matched public TCRs with epitope annotations, HLA restrictions, and match scores
Is built with Claude Code, using AI-assisted development to rapidly prototype and extend the tool during the hackathon

Think of it as TCRMatch — but database-agnostic.

Why It Matters

Today	With Public.Match
Run TCRMatch → IEDB only	Single query → IEDB + VDJdb + McPAS + 10x
Manual format conversion per DB	Unified input/output schema
No cross-database deduplication	Merged, ranked hits across all sources

Identifying public TCRs that recognize known epitopes helps distinguish antigen-specific from bystander T cells — a key step in spatial immunology pipelines like our own Soleil engagement scoring framework.

Approach

Unify database schemas — normalize all sources into a common cdr3_alpha / cdr3_beta / epitope / mhc / v_gene / j_gene / source format
Extend or wrap TCRMatch — reuse its edit-distance / GLIPH2-inspired scoring logic against any database
Build a CLI — public-match --input repertoire.tsv --db iedb vdjdb mcpas 10x --score 0.97
Claude Code as co-developer — use Claude Code to accelerate implementation, handle format parsing edge cases, and generate test cases

Databases

Database	Entries	CDR3α	CDR3β	Epitope	HLA	Folder
IEDB	226,280 TCR records	✓	✓	✓	✓	`Databases/IEDB/`
VDJdb	145,408 chain records	✓	✓	✓	✓	`Databases/VDJdb/`
McPAS-TCR	40,779	✓	✓	✓	✓	`Databases/McPAS/`
10x Genomics pMHC	189,515 cells / 4 donors	✓	✓	✓ (55 pMHC)	✓	`Databases/10xDcode/`
MixTCRpred	17,715 αβ pairs	✓	✓	✓ (146 pMHC)	✓	`Databases/MixTCRpred/`
BATCAVE	24,875 TCR–peptide measurements	✓	✓	✓ (mutational scan)	✓	`Databases/BATCAVE/`
OTS	1.63M non-redundant paired αβ	✓	✓	— (publicness only)	—	`Databases/OTS/` (manual download)

Unified schema

All databases are mapped to a common record format:

cdr3_alpha    CDR3 amino acid sequence of the alpha chain
cdr3_beta     CDR3 amino acid sequence of the beta chain
epitope       Epitope peptide sequence
mhc           MHC/HLA restriction (e.g. HLA-A*02:01)
v_alpha       TRAV gene
j_alpha       TRAJ gene
v_beta        TRBV gene
j_beta        TRBJ gene
source        Database of origin (IEDB / VDJdb / McPAS / 10x)

Databases on the roadmap

Database	Entries	Notes
TCRdb 2.0	~700M sequences	Broad clinical coverage; no epitope labels
STCRDab	~1,000	3D structural data from PDB
PIRD	large	Pan Immune Repertoire Database; China National GeneBank
ePytope-TCR datasets	21 datasets / 762 epitopes	2025 benchmarking collection on Zenodo

Installation

git clone https://github.com/Marcus-Mendes/Public.Match
cd Public.Match
conda env create -f environment.yml
conda activate public-match
pip install -e .

Usage

Beta chain (default)

python -m public_match --input sequences.fasta

Alpha chain

python -m public_match --input alpha_seqs.fasta --chain alpha

Paired αβ — two FASTA files (matched by sequence name)

python -m public_match --input beta.fasta --input-alpha alpha.fasta --chain paired

Paired αβ — single TSV/CSV with both columns

python -m public_match --input repertoire.tsv --chain paired
# explicit column names if auto-detection fails:
python -m public_match --input repertoire.tsv --chain paired \
  --seq-col cdr3_beta --alpha-col cdr3_alpha

Select specific databases

python -m public_match --input sequences.fasta --db iedb vdjdb mcpas

Available databases: iedb, vdjdb, mcpas, 10x, mixtcrpred, batcave, ots

Matching methods and thresholds

# BLOSUM62-normalised score (default, 0–1; 0.97 = near-exact match)
python -m public_match --input sequences.fasta --method blosum --threshold 0.97

# Edit distance (integer; 1 = one substitution allowed)
python -m public_match --input sequences.fasta --method edit --threshold 1

# Exact match only
python -m public_match --input sequences.fasta --method exact

Custom database

python -m public_match --input sequences.fasta --custom-db my_db.csv
# specify the CDR3β column if it differs from the defaults:
python -m public_match --input sequences.fasta --custom-db my_db.tsv \
  --custom-db-cdr3-col junction_aa

Output

Results are written to public_match_results.csv by default. Use --output to change the path:

python -m public_match --input sequences.fasta --output results/my_run.csv

Output columns: query_name, query_cdr3b (and query_cdr3a for paired mode), cdr3_alpha, cdr3_beta, epitope, mhc, v_alpha, j_alpha, v_beta, j_beta, source, score.

All options

python -m public_match --help

  --input/-i PATH          Input file: FASTA or tabular (TSV/CSV/AIRR)
  --output/-o PATH         Output CSV (default: public_match_results.csv)
  --db DB [DB ...]         Databases to search (default: all)
  --method {blosum,edit,exact}  Matching method (default: blosum)
  --threshold FLOAT        Score threshold (default: 0.97)
  --chain {beta,alpha,paired}   Chain mode (default: beta)
  --seq-col COL            CDR3β column in tabular input
  --alpha-col COL          CDR3α column in tabular input
  --name-col COL           ID column in tabular input
  --input-alpha PATH       CDR3α FASTA for paired mode
  --custom-db PATH [PATH ...]  Custom database file(s)
  --custom-db-cdr3-col COL     CDR3β column in custom DB

Hackathon Deliverable

A working CLI prototype that takes a CDR3 repertoire file and returns matched public TCRs from all four databases, with a unified output format and match score.

Built at Hackathon · 2026 · with Claude Code

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Databases		Databases
public_match		public_match
scripts		scripts
.gitignore		.gitignore
DATABASES.md		DATABASES.md
README.md		README.md
TestData.fasta		TestData.fasta
app.py		app.py
app_v2.py		app_v2.py
environment.yml		environment.yml
example_input.fasta		example_input.fasta
example_input_alpha.fasta		example_input_alpha.fasta
example_input_paired_alpha.fasta		example_input_paired_alpha.fasta
example_input_paired_beta.fasta		example_input_paired_beta.fasta
example_results.csv		example_results.csv
example_results_alpha.csv		example_results_alpha.csv
example_results_paired.csv		example_results_paired.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Public.Match

The Problem

Our Idea

Why It Matters

Approach

Databases

Unified schema

Databases on the roadmap

Installation

Usage

Beta chain (default)

Alpha chain

Paired αβ — two FASTA files (matched by sequence name)

Paired αβ — single TSV/CSV with both columns

Select specific databases

Matching methods and thresholds

Custom database

Output

All options

Hackathon Deliverable

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Public.Match

The Problem

Our Idea

Why It Matters

Approach

Databases

Unified schema

Databases on the roadmap

Installation

Usage

Beta chain (default)

Alpha chain

Paired αβ — two FASTA files (matched by sequence name)

Paired αβ — single TSV/CSV with both columns

Select specific databases

Matching methods and thresholds

Custom database

Output

All options

Hackathon Deliverable

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages