sanderlab · Ramrajnagar · Jan 28, 2026
diff --git a/README.md b/README.md
@@ -131,3 +131,33 @@ When training with leave-one-out validation, make sure to specify the drug index
 	* `best.W`, `best.alpha`, `best.eps`: model parameters snapshot for each training stage
 	* `best.test_hat`: Prediction on test set, using the best model for each stage
 	* `.ckpt` files are the final models in tensorflow compatible format.
+
+# Working with Sparse Data (scRNAseq)
+
+CellBox supports training on sparse data formats (e.g., scRNAseq count matrices) to improve memory efficiency and performance.
+
+### 1. Data Preparation
+Convert your expression and perturbation matrices to `scipy.sparse` Compressed Sparse Row (CSR) format and save them as `.npz` files.
+
+```python
+import scipy.sparse
+import numpy as np
+
+# Save your sparse matrices
+scipy.sparse.save_npz('data/expr.npz', expr_matrix_csr)
+scipy.sparse.save_npz('data/pert.npz', pert_matrix_csr)
+```
+
+### 2. Configuration Update
+In your experiment configuration JSON file (e.g., `configs/MyExperiment.json`), set `sparse_data` to `true` and point to the `.npz` files:
+
+```json
+{
+  "sparse_data": true,
+  "expr_file": "expr.npz",
+  "pert_file": "pert.npz",
+  ...
+}
+```
+
+**Note**: When `sparse_data` is enabled, the Gaussian noise augmentation (`add_noise_level`) is currently not supported.