-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Describe the bug
I think that this could be an issue with AnnData and probably technically not a bug either. However, I believe that this is more likely to occur when running Muon and it is difficult to diagnose what the issue is so I believe it is worth at least a warning.
When plotting feature data with mu.pl.embedding with default use_raw=True, the results can be messed up if the var_names do not match between the object var_names and the .raw objects.
This may seem like an unusual situation to be in but it could be easily and inadvertently done by using mdata.var_make_names_unique() after combining two modality datasets with .raw already set.
To Reproduce
import numpy as np
import pandas as pd
import re
import scanpy as sc
import muon as mu
# 10x PBMC public data
mdata = mu.read_10x_h5('pbmc5k_protein/5k_pbmc_protein_v3_filtered_feature_bc_matrix.h5')
mdata.var_names_make_unique()
# Create pointers to each modality and set raw objects.
rna = mdata['rna']
mdata['rna'].raw = rna.copy()
prot = mdata['prot']
mdata['prot'].raw = prot.copy()
# Run UMAP
sc.pp.pca(rna)
sc.pp.pca(prot)
sc.pp.neighbors(rna)
sc.pp.neighbors(prot)
sc.tl.umap(rna)
Works great:
mu.pl.embedding(mdata,
basis = 'rna:umap',
color = ['CD4', 'CD8A', 'CD4_TotalSeqB', 'CD8a_TotalSeqB'],
s = 50,
vmax = "p99"
)
Now let's assume that the antibody data is not annotated with the '_TotalSeqB' suffix, and we need to make the var_names unique between modalities:
mdata['prot'].var_names = [re.search(".+(?=_TotalSeqB)",i).group(0) for i in mdata['prot'].var_names]
mdata['prot'].raw = prot.copy()
# Need to make the var_names unique between modalities
mdata.var_names_make_unique()
Now the embedding plot is completely messed up for both modalities:
mu.pl.embedding(mdata,
basis = 'rna:umap',
color = ['rna:CD4', 'rna:CD8A', 'prot:CD4', 'prot:CD8a'],
s = 50,
vmax = "p99",
)
Note that specifying use_raw = False will fix this.
Can also fix by correcting the var_names in the raw object:
mdata['rna'].raw = rna.copy()
mdata['prot'].raw = prot.copy()
mu.pl.embedding(mdata,
basis = 'rna:umap',
color = ['rna:CD4', 'rna:CD8A', 'prot:CD4', 'prot:CD8a'],
s = 50,
vmax = "p99"
)
It it worth noting that in Scanpy similar attempts to plot var_names that do not match between the layer in use and the raw object will return an error if use_raw=True, albeit not one that explains where the discrepancy lies:
sc.pl.embedding(mdata['rna'],
basis = 'umap',
color = ['rna:CD4', 'rna:CD8A'],
s = 50,
vmax = "p99",
use_raw=True
)
Expected behaviour
When plotting using the raw object, and the var_names do not match between the current layers and raw, the function should return with a descriptive error or warning e.g. "Warning: Var_names between 'raw' and current layer do not match, may lead to unwanted behaviour".
System
Python 3.12.9 | packaged by conda-forge | (main, Mar 4 2025, 22:44:42) [Clang 18.1.8 ]
macOS-15.5-arm64-arm-64bit
anndata 0.11.3
mudata 0.3.1
muon 0.1.7
numpy 2.1.3
pandas 2.2.3
scanpy 1.11.0