Potential bug in preprocess.py: Feature overwriting due to reference assignment and Pandas indexing

Hi there,

Thank you for sharing the code for HEIST! While reviewing the data preprocessing pipeline in utils/preprocess.py, I noticed a couple of potential issues in the loop that constructs the graphs list.

Specifically, in this block:

Python
for k in tqdm(range(len(adata.obs.cell_type))):
    G_gene = gene_network_dict[adata.obs.cell_type[k]]
    G_gene.num_nodes = NUM_GENES
    G_gene.cell_type = G_cell.cell_type[k]
    G_gene.X = torch.from_numpy(adata.X[k].reshape(NUM_GENES, 1))
    graphs.append(G_gene)
I believe there are two unintended behaviors here:

Object Reference Overwriting: Because G_gene fetches a direct reference to the PyG graph stored in gene_network_dict, the subsequent lines (G_gene.X = ... and G_gene.cell_type = ...) mutate the shared object in place. Consequently, all cells belonging to the same cell type will point to the exact same graph object in memory. Their gene expression features (.X) will be continuously overwritten, leaving all cells of a given type with the expression profile of the last cell processed in the loop. This seems to conflict with the paper's design, which assigns cell-specific initial expression features to each graph.

Pandas Indexing Error: Using adata.obs.cell_type[k] with an integer k performs label-based indexing. If the AnnData object uses string barcodes for its index, this line will throw a KeyError.
Could you please confirm if this aligns with your intended logic? Thank you again for your time and the great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug in preprocess.py: Feature overwriting due to reference assignment and Pandas indexing #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Potential bug in preprocess.py: Feature overwriting due to reference assignment and Pandas indexing #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions