COMPARATIVE TRANSCRIPTOMIC ANALYSIS OF NON-SMALL CELL LUNG CANCER LUAD AND LUSC : INTEGRATING BULK AND SINGLE-CELL RNA SEQUENCING TO MAP HISTOLOGICAL IMMUNE ARCHITECTURES
TABLE OF CONTENTS
KEY FINDINGS AND KEY TAKEAWAYS
omics codeathon general application 2026
organized by the African Society for Bioinformatics and Computational Biology(ASBCB) with support from the NIH office of Data Science Strategy.
BACKGROUND
Non-Small Cell lung cancer(NSCLC) is primarily categorized into two histological subtypes the Lung Adenocarcinoma(LUAD)
and Lung Squamous Cell Carcinoma(LUSC). While they share a common organ of origin their immunological landscapes are vastly
different. Understanding the histological drivers of T-cell exclusion is critical for optimizing subtype specific immunotherapy
and understanding why certain tumors present as immune deserts.
PROJECT OVERVIEW
This project performs a comparative transcriptomic analysis of 16 high-throughput datasets from The Cancer Genome Atlas(TCGA)
The study characterizes the T-cell marker expression , immune cell abundance and biological pathways of the 8 LUAD samples versus 8
LUSc samples to delineate the molecular boundaries between these two histologies.
Additionallysingle-cell RNA sequencing data was incorporated to validate immune cell-specific findings at higher resolution
OBJECTIVES
-
Perform intra-group and global audits of LUAD and LUSc transcriptomic data.
-
Standardize genetic features across histology-specific cohorts.
-
Quantify and compare t-cell infiltration and deconvolution scores between subtypes.
-
Identify histological drivers of immune activity via Differential Gene Expression(DGE).
-
Map funtional biological pathways unique to the LUAD and LUSC microenvironments.
-
Validate bulk RNA-seq findings using single-cell RNA-seq data. WORKFLOW
PIPELINE ARCHITECTURE
SINGLE-CELL VALIDATION To strenghten findings from bulk RNA-seq analysis, single -cell RNA sequencing (scRNA-seq) data was used for validation of T-cell infiltration
and immune activity patterns. DATASET: Validation was performed using publicly available scRNA-seq data fro GEO:
-Accession: GSE127465
-Data type: Single-cell RNA-seq of human lung tumor microenvironment
RATIONALE: Bulk RNA-seq provides averaged gene expression across all cells which may obscure cell-specific signals.
Single-cell RNA-seq allows validation at cellular resolution confirming the presence and activity of specific immune populations.
APPROACH:
-
Processed scRNA-seq expression matrix and metadata were used
-
T-cell activity was evaluated using marker gene
-
Gene scoring and visualization were performed to identify T-cell populations
VALIDATION OUTCOME:
-
confirmed presence of T-cell populations within tumor samples
-
Cytotoxic markers(CD8A, GZMB) indicated active immune response
-
Regulatory markers(FOXP3, PDCD1) suggested immune suppression mechanisms
INTERPRETATION:
The single-cell analysis supports bulk RNA-seq findings by demonstrating that:
-LUAD exhibits higher T-cell activity consistent with an immune "hot" phenotype
-LUSc show reduced immune presence consistent with an immune "desert phenotype"
This validation strengthens confidence in the observed histological immune differences.
CODE AVAILABILITY
All the scripts(python) for the T-cell project are available in the repository:
👉 Browse the scripts: View scripts
T-CELL LINEAGE USED:
- Core T-cell populations:
i) CD8+ Cytotoxic T-cells(CD8A, CD8B)
ii) CD4+ helper T-cells(CD4 marker)
iii) Regulatory T-cells(Tregs)
-
General lymphocyte markers:
i)CD3 complex(CD3D, CD3E), identified T-cell presence regardless of their specific subtype
-
Functional effector cells:
i) Activated cytotoxic lymphocytes(GZMB, PRF1)
- Comparative non-T cell markers(identified through deconvolution):
i) Natural Killer(NK) cells
ii) Macrophages
KEY FINDINGS AND KEY TAKEAWAYS
- Histological distinction:
PCA confirms that LUAD and LUSc possess distinct global transcriptomic fingerprints despite shared organ origin. view PCA plot
-
Immune Architecture
LUAD exhibits a more consistent immune-active microenvironment characterized by higher median T-cell infilration scores. View subtype Comparison
-
T-cell Signaling:
Critical cytotoxic markers(CD8A, GZMB) show significant histological preference revealing the "hot" vs "cold" nature of these tumors. View Volcano Plot
- Proliferative Tradeoff:
LUSC demonstrates a hyper-proliferative signature (mitotic spindle organization) which correlates with a "colder" or more excluded immune profile. View LUSC Pathways
- Cellular composition:
Digital deconvolution reveals a higher abundance of cytotoxic lymphocytes in LUAD while Macrophages remain a stable component in both histologies. View Deconvolution Heatmap
- Functional Mechanism:
LUAD is significantly enriched in pathways related to antigen processing and apoptopic cell clearance. View LUAD Pathways
- Validation of findings:
Single-cell RNA-seq analysis confirms the presence of T-cell populations and supports the observed immune-active(LUAd) and immune-desert(LUSc) phenotypes.
-
T-cell abundance across cell populations supports the observed immune-active(LUAD) and immune-desert(LUSC) phenotypes. View T-cell Abundance Validation
-
UMAP visualization further highlights clustering of immune cells and spatial distribution of T-cell population accross the dataset. View UMAP Validation Atlas
REPOSITORY STRUCTURE
-
Data - raw and audited count matrices(LUAD/LUSC)
-
Scripts - python scripts for auditing, DGE and plotting
-
Results - statistical output tables(DEGs, enrichment scores)
-
Figures - generated QC plots, heatmaps and volcano plots
-
accessions/ accessions.txt - list of TCGA case IDs and file UUIDs
-validation/ -data - scRNA-seq dataset -scripts - validation scripts -figures - UMAp and barplot validation plots -results - validation outputs. -README.md - validation documentation
-
README.md - general project documentation
-
.gitignore - files to exclude from version control
-
license - MIT license
TOOLS & SOFTWARE
Language: Python 3.10+
Statistics: PyDeseq2, GSEApy
Data handling: Pandas, Numpy
Visualization: Matplotlib, Seaborn, Bioinfokit
APIs: MyGene.info
Single-cell analysis: Scanpy
LICENSE
License : License: MIT
CONTRIBUTORS
-
Mbaoji Florence Nwakaego Department of Pharmacology and Toxicology, Faculty of Pharmaceutical Sciences, University of Nigeria Nsukka, Nsukka, Enugu State, Nigeria
-
Chemutai Queen Department of Biochemistry, Faculty of Biomedical sciences, Jomo Kenyatta University of Agriculture and Technology, Kenya
-
John Nnaemeka Nkwocha Department of Biochemistry, University of Port Harcourt, Choba, River State, Nigeria.
-
Usman Yalwa. MSc Bioinformatics student at Kalinga University, India
-
Mark Matthew Edet Department of Morphological Veterinary Medicine, Chungbuk National University, South Korea. Department of Human Biochemistry, Faculty of Basic Medical Sciences, Nnamdi Azikiwe University, Nnewi, Nigeria.
-
Zilungile Coki SSc(HONS) Biotechnology student, University of the Western Cape, South Africa
-
Oluwaseun Martins Olowabi. Cancer Research and Molecular Biology, Department of Biochemistry, University of Ibadan
-
Valerie Martins department of cell biology and genetics, university of Lagos, Akoka
-
Eva Akurut Department of Immunology and Molecular Biology, College of Health Sciences, Makerere University, Kampala, Uganda. The African Centre of Excellence in Bioinformatics and Data-Intensive Sciences, Infectious Diseases Institute, Makerere University, Kampala, Uganda.
-
Olaitan I. Awe African Society for Bioinformatics and Computational Biology (ASBCB), Cape Town, South Africa Project Advisor
ACKNOWLEDGEMENTS
We thank the NIH Office of Data Science Strategy for their support before and during the October 2026 Omics Codeathon, co-organized with the African Society for Bioinformatics and Com> We also thank Dr. Awe for his ongoing guidance and all collaborators who contributed to this project.

