Skip to content

Li-Jiabei/windowCNV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 

Repository files navigation

windowCNV

windowCNV is a window-based tool for detecting copy number alterations (CNAs) from single-cell RNA-seq data. Inspired by infercnvpy, it extends its functionality with flexible CNA simulation, inference, and evaluation. The package supports cell type–aware analysis and event-level performance metrics.

Note: This package is experimental. Though benchmarked against inferCNVpy on simulated and real datasets, CNA classification accuracy in real-world datasets may be limited. Feedback and contributions are welcome.


Features

  • Simulate CNAs globally or per cell type
  • Infer CNAs from gene expression using customizable smoothing windows
  • Assign CNA events to individual cells
  • Evaluate precision, recall, and F1 score at the event level
  • Visualize CNAs with heatmaps and summary tables

Installation

We recommend using a dedicated conda environment:

conda create -n windowcnv python=3.10
conda activate windowcnv

Then install windowCNV and its required dependencies:

pip install infercnvpy scanpy matplotlib pandas
pip install git+https://github.com/Li-Jiabei/windowCNV.git

Getting Started

Import the required packages:

import numpy as np
import pandas as pd
import scanpy as sc
import infercnvpy as cnv
import matplotlib.pyplot as plt
import warnings
from collections.abc import Sequence

import windowCNV as wcnv

Example Notebooks and Data

You can explore how to use windowCNV in the following notebooks:

Simulated CNAs (Benchmarking & Validation)

These notebooks use the benchmarking dataset: PBMC_simulated_cnas_041025.h5ad


This notebook uses the dataset: pbmc_10k_v3_filtered_feature_bc_matrix.h5

Note: Many real-world datasets (including the one above) lack chromosome and genomic position annotations in AnnData.var. To address this, we provide a helper function for automatic annotation. The usage is shown in the notebook. However, you must supply a gene annotation file.

In our example, we use the following reference file: mart_export_GRCh38.p14.txt This file contains:

  • Gene stable ID
  • Gene name
  • Chromosome/scaffold name
  • Gene start (bp)
  • Gene end (bp)

The file was generated using Ensembl BioMart, which allows easy download of such annotations for genome build.


PSC scRNA-seq data with previously reported PSC CNAs

This notebook uses the dataset: SCP2745_high_conf_CAS_cell_types.h5ad

This notebook uses the dataset: GSM4476486_combined_UMIcount_CellTypes_TNBC1.txt.gz

This notebook uses the dataset: GSM7744300_GUIDEvsNT_CHR14_RESULTS.txt and gencode.v38.annotation.gtf

About

A window-based refinement of inferCNV for more accurate CNV detection from single-cell RNA-seq data, featuring customizable smoothing, cell type annotations, and event-level evaluation tools.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors