Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,33 @@ When training with leave-one-out validation, make sure to specify the drug index
* `best.W`, `best.alpha`, `best.eps`: model parameters snapshot for each training stage
* `best.test_hat`: Prediction on test set, using the best model for each stage
* `.ckpt` files are the final models in tensorflow compatible format.

# Working with Sparse Data (scRNAseq)

CellBox supports training on sparse data formats (e.g., scRNAseq count matrices) to improve memory efficiency and performance.

### 1. Data Preparation
Convert your expression and perturbation matrices to `scipy.sparse` Compressed Sparse Row (CSR) format and save them as `.npz` files.

```python
import scipy.sparse
import numpy as np

# Save your sparse matrices
scipy.sparse.save_npz('data/expr.npz', expr_matrix_csr)
scipy.sparse.save_npz('data/pert.npz', pert_matrix_csr)
```

### 2. Configuration Update
In your experiment configuration JSON file (e.g., `configs/MyExperiment.json`), set `sparse_data` to `true` and point to the `.npz` files:

```json
{
"sparse_data": true,
"expr_file": "expr.npz",
"pert_file": "pert.npz",
...
}
```

**Note**: When `sparse_data` is enabled, the Gaussian noise augmentation (`add_noise_level`) is currently not supported.