Skip to content

Challenges in Using Panpipes for Single-Cell Multiomics Data Analysis #325

@liuchuan111

Description

@liuchuan111

Using Panpipes for single-cell omics data analysis presents several challenges, making it difficult for users to effectively apply the tool.

Tutorials and Examples: While the documentation states that Panpipes can ingest various file types, such as outs/filtered_feature_bc_matrix.h5 (from CellRanger) or AnnData h5ad objects (one per sample), the tutorials lack clarity on how to generate the h5ad objects for different modalities. Moreover, the provided examples mainly demonstrate single-sample analysis, with no guidance on analyzing multiple replicate samples from different groups, which would be more practical for most use cases.

Handling .yaml Files: Modifying the .yaml configuration file is also challenging due to the absence of clear instructions on which fields are mandatory or optional when working with scRNA-seq and/or ATAC-seq data.

Given these limitations, we encountered various error messages during the data ingest process. For example, we have two groups of samples, each containing 4 replicates (A1–A4 and B1–B4), and all samples are divided into two parts for scRNA-seq and ATAC-seq separately. It is unclear which example or workflow we should follow for such a setup. Should we separately ingest these two modalities or ingest all data in one submission file?

Additionally, while the ingestion of outs/filtered_feature_bc_matrix.h5 for scRNA-seq data works fine, Panpipes fails to read the filtered_peak_bc_matrix output of ATAC-seq data generated by CellRanger. The error indicates the absence of a features.tsv file in this directory, as it contains peaks.bed instead. To address this, we merged the filtered_peak_bc_matrix.h5 files from these samples into a single atac.h5ad file. Finally, we generated two .h5ad files: one for ATAC (atac.h5ad) and one for RNA (rna.h5ad), each containing the eight samples. However, the ingest process of these two .h5ad filepath in one sample_file_qc.txt still resulted in the following error:

"ERROR \

Exception #1 \

'builtins.OSError(--------------------------------------- \

Child was terminated by signal -1: \

The stderr was: \

"lib/python3.10/site-packages/docrep/decorators.py:43: SyntaxWarning: 'pa
ram_categorical_covariate_keys' is not a valid key! \

doc = func(self, args[0].doc, *args[1:], **kwargs) \

computing score 'MarkersNeutro_score' \

WARNING: genes are not in var_names and ignored: Index(['ANXA1', 'ARG1', 'BPI', 'CD101', 'CD24', 'CD274', 'CSF3R', 'CXCL8', \

I am curious if anyone has successfully used Panpipes for analyzing their own single-cell multiomics data under similar conditions. If so, sharing insights or additional resources would be highly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions