-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Using Panpipes for single-cell omics data analysis presents several challenges, making it difficult for users to effectively apply the tool.
Tutorials and Examples: While the documentation states that Panpipes can ingest various file types, such as outs/filtered_feature_bc_matrix.h5 (from CellRanger) or AnnData h5ad objects (one per sample), the tutorials lack clarity on how to generate the h5ad objects for different modalities. Moreover, the provided examples mainly demonstrate single-sample analysis, with no guidance on analyzing multiple replicate samples from different groups, which would be more practical for most use cases.
Handling .yaml Files: Modifying the .yaml configuration file is also challenging due to the absence of clear instructions on which fields are mandatory or optional when working with scRNA-seq and/or ATAC-seq data.
Given these limitations, we encountered various error messages during the data ingest process. For example, we have two groups of samples, each containing 4 replicates (A1–A4 and B1–B4), and all samples are divided into two parts for scRNA-seq and ATAC-seq separately. It is unclear which example or workflow we should follow for such a setup. Should we separately ingest these two modalities or ingest all data in one submission file?
Additionally, while the ingestion of outs/filtered_feature_bc_matrix.h5 for scRNA-seq data works fine, Panpipes fails to read the filtered_peak_bc_matrix output of ATAC-seq data generated by CellRanger. The error indicates the absence of a features.tsv file in this directory, as it contains peaks.bed instead. To address this, we merged the filtered_peak_bc_matrix.h5 files from these samples into a single atac.h5ad file. Finally, we generated two .h5ad files: one for ATAC (atac.h5ad) and one for RNA (rna.h5ad), each containing the eight samples. However, the ingest process of these two .h5ad filepath in one sample_file_qc.txt still resulted in the following error:
"ERROR \
Exception #1 \
'builtins.OSError(--------------------------------------- \
Child was terminated by signal -1: \
The stderr was: \
"lib/python3.10/site-packages/docrep/decorators.py:43: SyntaxWarning: 'pa
ram_categorical_covariate_keys' is not a valid key! \
doc = func(self, args[0].doc, *args[1:], **kwargs) \
computing score 'MarkersNeutro_score' \
WARNING: genes are not in var_names and ignored: Index(['ANXA1', 'ARG1', 'BPI', 'CD101', 'CD24', 'CD274', 'CSF3R', 'CXCL8', \
I am curious if anyone has successfully used Panpipes for analyzing their own single-cell multiomics data under similar conditions. If so, sharing insights or additional resources would be highly appreciated!