Skip to content

omicscodeathon/tcellex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMPARATIVE TRANSCRIPTOMIC ANALYSIS OF NON-SMALL CELL LUNG CANCER LUAD AND LUSC : INTEGRATING BULK AND SINGLE-CELL RNA SEQUENCING TO MAP HISTOLOGICAL IMMUNE ARCHITECTURES

TABLE OF CONTENTS

BACKGROUND

PROJECT OVERVIEW

OBJECTIVES

WORKFLOW

PIPELINE ARCHITECTURE

SINGLE-CELL VALIDATION

CODE AVAILABILITY

T-CELL LINEAGE USED

KEY FINDINGS AND KEY TAKEAWAYS

REPOSITORY STRUCTURE

TOOLS & SOFTWARE

LICENSE

CONTRIBUTORS

ACKNOWLEDMENTS

                        omics codeathon general application 2026
    organized by the African Society for Bioinformatics and Computational Biology(ASBCB) with support from the NIH office of Data Science Strategy.

BACKGROUND

Non-Small Cell lung cancer(NSCLC) is primarily categorized into two histological subtypes the Lung Adenocarcinoma(LUAD)

and Lung Squamous Cell Carcinoma(LUSC). While they share a common organ of origin their immunological landscapes are vastly

different. Understanding the histological drivers of T-cell exclusion is critical for optimizing subtype specific immunotherapy

and understanding why certain tumors present as immune deserts.

PROJECT OVERVIEW

This project performs a comparative transcriptomic analysis of 16 high-throughput datasets from The Cancer Genome Atlas(TCGA)

The study characterizes the T-cell marker expression , immune cell abundance and biological pathways of the 8 LUAD samples versus 8

LUSc samples to delineate the molecular boundaries between these two histologies.

Additionallysingle-cell RNA sequencing data was incorporated to validate immune cell-specific findings at higher resolution

OBJECTIVES

  • Perform intra-group and global audits of LUAD and LUSc transcriptomic data.

  • Standardize genetic features across histology-specific cohorts.

  • Quantify and compare t-cell infiltration and deconvolution scores between subtypes.

  • Identify histological drivers of immune activity via Differential Gene Expression(DGE).

  • Map funtional biological pathways unique to the LUAD and LUSC microenvironments.

  • Validate bulk RNA-seq findings using single-cell RNA-seq data. WORKFLOW

RNAseq Workflow

PIPELINE ARCHITECTURE

PIPELINE

SINGLE-CELL VALIDATION To strenghten findings from bulk RNA-seq analysis, single -cell RNA sequencing (scRNA-seq) data was used for validation of T-cell infiltration

and immune activity patterns. DATASET: Validation was performed using publicly available scRNA-seq data fro GEO:

-Accession: GSE127465

-Data type: Single-cell RNA-seq of human lung tumor microenvironment

RATIONALE: Bulk RNA-seq provides averaged gene expression across all cells which may obscure cell-specific signals.

Single-cell RNA-seq allows validation at cellular resolution confirming the presence and activity of specific immune populations.

APPROACH:

  • Processed scRNA-seq expression matrix and metadata were used

  • T-cell activity was evaluated using marker gene

  • Gene scoring and visualization were performed to identify T-cell populations

VALIDATION OUTCOME:

  • confirmed presence of T-cell populations within tumor samples

  • Cytotoxic markers(CD8A, GZMB) indicated active immune response

  • Regulatory markers(FOXP3, PDCD1) suggested immune suppression mechanisms

INTERPRETATION:

The single-cell analysis supports bulk RNA-seq findings by demonstrating that:

-LUAD exhibits higher T-cell activity consistent with an immune "hot" phenotype

-LUSc show reduced immune presence consistent with an immune "desert phenotype"

This validation strengthens confidence in the observed histological immune differences.

CODE AVAILABILITY

All the scripts(python) for the T-cell project are available in the repository:

👉 Browse the scripts: View scripts

T-CELL LINEAGE USED:

  • Core T-cell populations:

i) CD8+ Cytotoxic T-cells(CD8A, CD8B)

ii) CD4+ helper T-cells(CD4 marker)

iii) Regulatory T-cells(Tregs)

  • General lymphocyte markers:

    i)CD3 complex(CD3D, CD3E), identified T-cell presence regardless of their specific subtype

  • Functional effector cells:

i) Activated cytotoxic lymphocytes(GZMB, PRF1)

  • Comparative non-T cell markers(identified through deconvolution):

i) Natural Killer(NK) cells

ii) Macrophages

KEY FINDINGS AND KEY TAKEAWAYS

  • Histological distinction:

PCA confirms that LUAD and LUSc possess distinct global transcriptomic fingerprints despite shared organ origin. view PCA plot

  • Immune Architecture

    LUAD exhibits a more consistent immune-active microenvironment characterized by higher median T-cell infilration scores. View subtype Comparison

  • T-cell Signaling:

Critical cytotoxic markers(CD8A, GZMB) show significant histological preference revealing the "hot" vs "cold" nature of these tumors. View Volcano Plot

  • Proliferative Tradeoff:

LUSC demonstrates a hyper-proliferative signature (mitotic spindle organization) which correlates with a "colder" or more excluded immune profile. View LUSC Pathways

  • Cellular composition:

Digital deconvolution reveals a higher abundance of cytotoxic lymphocytes in LUAD while Macrophages remain a stable component in both histologies. View Deconvolution Heatmap

  • Functional Mechanism:

LUAD is significantly enriched in pathways related to antigen processing and apoptopic cell clearance. View LUAD Pathways

  • Validation of findings:

Single-cell RNA-seq analysis confirms the presence of T-cell populations and supports the observed immune-active(LUAd) and immune-desert(LUSc) phenotypes.

  1. T-cell abundance across cell populations supports the observed immune-active(LUAD) and immune-desert(LUSC) phenotypes. View T-cell Abundance Validation

  2. UMAP visualization further highlights clustering of immune cells and spatial distribution of T-cell population accross the dataset. View UMAP Validation Atlas

REPOSITORY STRUCTURE

  • Data - raw and audited count matrices(LUAD/LUSC)

  • Scripts - python scripts for auditing, DGE and plotting

  • Results - statistical output tables(DEGs, enrichment scores)

  • Figures - generated QC plots, heatmaps and volcano plots

  • accessions/ accessions.txt - list of TCGA case IDs and file UUIDs

-validation/ -data - scRNA-seq dataset -scripts - validation scripts -figures - UMAp and barplot validation plots -results - validation outputs. -README.md - validation documentation

  • README.md - general project documentation

  • .gitignore - files to exclude from version control

  • license - MIT license

    TOOLS & SOFTWARE

Language: Python 3.10+

Statistics: PyDeseq2, GSEApy

Data handling: Pandas, Numpy

Visualization: Matplotlib, Seaborn, Bioinfokit

APIs: MyGene.info

Single-cell analysis: Scanpy

LICENSE

License : License: MIT

CONTRIBUTORS

  1. Mbaoji Florence Nwakaego Department of Pharmacology and Toxicology,   Faculty of Pharmaceutical Sciences,   University of Nigeria Nsukka,  Nsukka,  Enugu State, Nigeria

  2. Chemutai Queen Department of Biochemistry, Faculty of Biomedical sciences, Jomo Kenyatta University of Agriculture and Technology, Kenya

  3. John Nnaemeka Nkwocha Department of Biochemistry, University of Port Harcourt, Choba, River State, Nigeria.

  4. Usman Yalwa. MSc Bioinformatics student at Kalinga University, India

  5. Mark Matthew Edet Department of Morphological Veterinary Medicine, Chungbuk National University, South Korea. Department of Human Biochemistry, Faculty of Basic Medical Sciences, Nnamdi Azikiwe University, Nnewi, Nigeria.

  6. Zilungile Coki SSc(HONS) Biotechnology student, University of the Western Cape, South Africa

  7. Oluwaseun Martins Olowabi. Cancer Research and Molecular Biology, Department of Biochemistry, University of Ibadan

  8. Valerie Martins department of cell biology and genetics, university of Lagos, Akoka

  9. Eva Akurut Department of Immunology and Molecular Biology, College of Health Sciences, Makerere University, Kampala, Uganda. The African Centre of Excellence in Bioinformatics and Data-Intensive Sciences, Infectious Diseases Institute, Makerere University, Kampala, Uganda.

  10. Olaitan I. Awe African Society for Bioinformatics and Computational Biology (ASBCB), Cape Town, South Africa Project Advisor

ACKNOWLEDGEMENTS

We thank the NIH Office of Data Science Strategy for their support before and during the October 2026 Omics Codeathon, co-organized with the African Society for Bioinformatics and Com> We also thank Dr. Awe for his ongoing guidance and all collaborators who contributed to this project.

About

T Cell Expression Analysis in Lung Cancer using Bioinformatics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages