Skip to content

BorchLab/immGLIPH

Repository files navigation

immGLIPH

R-CMD-check Codecov test coverage

An R implementation of the GLIPH and GLIPH2 algorithms for clustering T cell receptors (TCRs) predicted to bind the same HLA-restricted peptide antigen.

immGLIPH identifies specificity groups based on local (motif-based) and global (sequence-based) CDR3 similarities, then clusters them into convergence groups and scores each group for biological significance.

Please cite the original publications:

  • GLIPH: Glanville, J. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94-98 (2017). doi:10.1038/nature22976

  • GLIPH2: Huang, H. et al. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nature Biotechnology 38, 1194-1202 (2020). doi:10.1038/s41587-020-0505-4

Installation

devtools::install_github("BorchLab/immGLIPH")

Reference Data

immGLIPH uses naive TCR repertoire reference databases for motif enrichment testing and cluster scoring. The reference data (~19 MB) is not bundled with the package to keep the install size small. Instead, it is downloaded automatically on first use from Zenodo and cached locally via BiocFileCache.

Setup

# Install BiocFileCache (one-time)
BiocManager::install("BiocFileCache")

# Pre-download the reference data (optional -- happens automatically on first runGLIPH() call)
library(immGLIPH)
ref <- getGLIPHreference()

Available References

Name Species Subset Source
"human_v1.0_CD4" Human CD4 Glanville et al. (2017)
"human_v1.0_CD8" Human CD8 Glanville et al. (2017)
"human_v1.0_CD48" Human CD4+CD8 Glanville et al. (2017)
"human_v2.0_CD4" Human CD4 Huang et al. (2020)
"human_v2.0_CD8" Human CD8 Huang et al. (2020)
"human_v2.0_CD48" Human CD4+CD8 Huang et al. (2020)
"mouse_v1.0_CD4" Mouse CD4 Huang et al. (2020)
"mouse_v1.0_CD8" Mouse CD8 Huang et al. (2020)
"mouse_v1.0_CD48" Mouse CD4+CD8 Huang et al. (2020)
"gliph_reference" Human CD4+CD8 Legacy alias for human_v1.0_CD48

Select a reference with the refdb_beta parameter (default: "human_v2.0_CD48"):

# Human (default)
res <- runGLIPH(my_data, refdb_beta = "human_v2.0_CD48")

# Mouse
res <- runGLIPH(mouse_data, refdb_beta = "mouse_v1.0_CD48")

# Custom data frame
res <- runGLIPH(my_data, refdb_beta = my_custom_ref)

Each reference includes pre-computed V-gene usage and CDR3 length frequency distributions that are automatically used during cluster scoring.

Rebuilding Reference Data

The build script downloads the raw reference files from the GLIPH web server, processes them, and saves the resulting reference_list.RData:

Rscript data-raw/build_reference_list.R

The output file should then be uploaded to Zenodo (see data-raw/build_reference_list.R for details).

Quick Start

library(immGLIPH)
data("gliph_input_data")

res <- runGLIPH(
  cdr3_sequences = gliph_input_data,
  method         = "gliph2",
  sim_depth      = 500,
  n_cores        = 2
)

# Convergence group scores
head(res$cluster_properties)

# Enriched motifs
head(res$motif_enrichment$selected_motifs)

Integration with scRepertoire

immGLIPH integrates with the scRepertoire ecosystem through immApex. runGLIPH() can directly accept Seurat objects, SingleCellExperiment objects, or combineTCR() output.

Documentation

See the package vignette for the full tutorial:

vignette("immGLIPH")

Validation

immGLIPH reproduces the published GLIPH and GLIPH2 cluster vectors on each paper's own dataset, when run with paper-matched parameters and (where applicable) the same post-hoc filtering criteria.

Dataset immGLIPH configuration n ARI NMI Pairwise F1 Precision Recall
Glanville 2017 gliph1 + paper params 144 0.985 0.994 0.985 1.000 0.971
Huang 2020 gliph2 + paper params + filter 171 0.863 0.968 0.867 0.931 0.812

Comparison universe: CDR3s present in both the immGLIPH input and the published reference cluster output. Higher = stronger agreement with the original tool's output.

For Glanville, every pair immGLIPH co-clustered was also co-clustered by the original GLIPH (precision = 1.000), and 97% of the original GLIPH co-clusterings were recovered (recall = 0.971). For Huang, precision is 0.931 and recall 0.812 against the curated 354-group GLIPH2 set.

Full pipeline (data prep, paper-matched parameter settings, post-hoc filter implementation, metric definitions, and a Quarto report) lives at the companion benchmark repository:

BorchLab/immGLIPH-benchmark

Bug Reports/New Features

If you run into any issues or bugs please submit a GitHub issue with details of the issue.

Any requests for new features or enhancements can also be submitted as GitHub issues.

Pull Requests are welcome for bug fixes, new features, or enhancements.

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages