Extension of zkml for distributed proving using Ray, layer-wise partitioning, and Merkle trees.
⚠️ Status Note: This is an experimental research project. Also consider zk-torch.
Make Merkle root public: Add root to public values so next chunk can verify itDoneComplete proof generation: Connect chunk execution to actual proof generation (#8)DoneRay-Rust integration: Connect Python Ray workers to Rust proof generation (#9)DoneGPU acceleration: ICICLE GPU backend for MSM operations (#10)Done - see GPU Acceleration
- Status and Limitations
- Overview
- Implementation
- Requirements
- Quick Start
- GPU Acceleration
- Testing
- References
This project implements a Ray-based distributed proving approach for zkml. It is experimental research code and should be considered useful for studying alternative approaches to zkML parallelization. The current status lacks formal security analysis and proof composition.
Proof Composition: This implementation generates separate proofs per chunk. It does not implement recursive proof composition or aggregation. Verifiers must check O(n) proofs rather than O(1), limiting succinctness.
Trust Domain:
- Merkle trees provide privacy for proof readers, not compute providers: The prover must know all weights and activations to generate a valid ZK proof. Merkle trees hide intermediate values from people reading the published proof, not from the compute provider during execution.
- Multi-party security requires different trust domains: Security only applies when chunks are distributed across different trust domains (e.g., your servers + AWS), not just different AWS regions.
- Comparison to TEE/FHE/MPC: Trusted Execution Environments (TEEs), Fully Homomorphic Encryption (FHE), or Multi-Party Computation (MPC) provide stronger privacy guarantees but at significant costs that are beyond the threshold of scalable AI applications.
Consider this project if:
- Researching alternative zkml parallelization approaches
- Need examples of Ray integration for cryptographic workloads
- Studying Merkle-based privacy for intermediate computations
- Building distributed halo2 proving (not zkML-specific)
- Use case: You trust compute providers but want to limit public proof exposure, or model is partitioned across multiple non-colluding organizations
Use alternatives if:
- Need to hide data from compute providers themselves → Requires TEEs/FHE/MPC
- Need single aggregated proof → Consider zk-torch
This repository extends zkml (see ZKML paper) with distributed proving capabilities. zkml provides an optimizing compiler from TensorFlow to halo2 ZK-SNARK circuits.
distributed-zkml adds:
- Layer-wise partitioning: Split ML models into chunks for parallel proving across GPUs via Ray
- Merkle tree commitments: Hash intermediate activations with Poseidon; only publish root in proof
- ICICLE GPU acceleration: Hardware-accelerated MSM operations
| Feature | zkml | distributed-zkml |
|---|---|---|
| Architecture | Single-machine | Distributed across GPUs |
| Scalability | Single GPU memory | Horizontal scaling |
| Privacy | Outputs public | Intermediate values hidden from proof readers via Merkle trees |
- Model Partitioning: Split model into chunks at layer boundaries
- Parallel Execution: Each chunk runs on a separate GPU via Ray
- Merkle Commitments: Hash intermediate outputs with Poseidon, only root is public
- On-Chain: Publish only the Merkle root (O(1) public values vs O(n) without)
Note: Each chunk produces a separate proof. This implementation does not aggregate proofs into a single succinct proof. Verifiers must check all chunk proofs individually (O(n) verification time). For single-proof aggregation, see zk-torch's accumulation-based approach.
Model: 9 layers -> 3 chunks
Chunk 1: Layers 0-2 -> GPU 1 -> Hash A
Chunk 2: Layers 3-5 -> GPU 2 -> Hash B
Chunk 3: Layers 6-8 -> GPU 3 -> Hash C
Merkle Tree:
Root (public)
/ \
Hash(AB) Hash C
/ \
Hash A Hash B
| Scenario | Hidden? | Explanation |
|---|---|---|
| Proof readers reconstructing weights via model inversion | Yes | Intermediate activations are hashed, not exposed in proof |
| Compute provider seeing weights during execution | No | Provider must have weights to generate ZK proof |
| Compute provider seeing intermediate activations during execution | No | Provider computes them |
Key insight: Merkle trees hide intermediate values from people reading the published proof, not from the compute provider during execution. The prover must know all values to generate a valid ZK proof.
Security depends on trust domains, not physical location:
| Setup | Trust Domains | What's Private |
|---|---|---|
| Single AWS account (any region) | 1 | Nothing from AWS — they control all regions |
| Your servers + AWS | 2 | Your portion's weights never sent to AWS |
| AWS + Google + Azure | 3 | Each provider sees only their chunk (assuming non-collusion) |
Multi-party benefit: If model is partitioned across different trust domains (e.g., your servers + AWS), no single party has the full model. Combined with Merkle trees, this provides layered privacy:
- Partitioning → limits what any single provider can access
- Merkle trees → limits what proof readers can observe
| Aspect | distributed-zkml | ZKTorch |
|---|---|---|
| Scaling strategy | Horizontal (more machines via Ray) | Vertical (proof compression via Mira) |
| Final output | N separate proofs | 1 accumulated proof |
| Verification cost | O(N) proofs to verify | O(1) single proof |
| Intermediate privacy | Merkle trees hide from proof readers | Exposed in proof |
| Base system | halo2 (~30M param limit) | Custom pairing-based (6B params tested) |
These approaches are orthogonal — could theoretically combine Ray parallelism with Mira accumulation.
distributed-zkml/
├── python/ # Python wrappers for Rust prover
├── tests/ # Distributed proving tests
└── zkml/ # zkml (modified for Merkle + chunking)
├── src/bin/prove_chunk.rs
└── testing/
Just Docker and Docker Compose. Everything else is in the container.
| Dependency | Notes |
|---|---|
| Rust (nightly) | Install via rustup |
| Python >=3.10 | |
| pip | pip install -e . |
| Build tools | Linux: build-essential pkg-config libssl-dev; macOS: Xcode CLI |
Python deps (installed via pip install -e .):
ray[default]>=2.31.0msgpack,numpy
Optional: NVIDIA GPU + CUDA 12.x + ICICLE backend for GPU acceleration
docker compose build dev
docker compose run --rm dev
# Inside container:
cd zkml && cargo test --test merkle_tree_test -- --nocapture# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Build
cd zkml && rustup override set nightly && cargo build --release && cd ..
# Python deps
pip install -e .Uses ICICLE for GPU-accelerated MSM (Multi-Scalar Multiplication).
- NVIDIA GPU (tested on A10G/T4, compatible with A100/H100)
- CUDA 12.x
- Ubuntu 20.04+
# 1. Download ICICLE backend (Ubuntu 22.04 - use ubuntu20 for 20.04)
curl -L -o /tmp/icicle.tar.gz \
https://github.com/ingonyama-zk/icicle/releases/download/v3.1.0/icicle_3_1_0-ubuntu22-cuda122.tar.gz
# 2. Install
mkdir -p ~/.icicle && tar -xzf /tmp/icicle.tar.gz -C /tmp && cp -r /tmp/icicle/lib/backend ~/.icicle/
# 3. Set env var (add to ~/.bashrc)
export ICICLE_BACKEND_INSTALL_DIR=~/.icicle/backend
# 4. Build with GPU
cd zkml && cargo build --release --features gpu
# 5. Verify
cargo test --test gpu_benchmark_test --release --features gpu -- --nocaptureExpected output:
Registered devices: ["CUDA", "CPU"]
Successfully set CUDA device 0
| Size | GPU MSM Time | Throughput |
|---|---|---|
| 2^14 (16K) | 6.5ms | 2.5M pts/sec |
| 2^16 (65K) | 7.9ms | 8.3M pts/sec |
| 2^18 (262K) | 13ms | 19.5M pts/sec |
- Measure FFT time:
HALO2_FFT_STATS=1 - GPU NTT (experimental):
HALO2_USE_GPU_NTT=1- currently slower due to conversion overhead
# Simulation (fast)
python tests/simple_distributed.py \
--model zkml/examples/mnist/model.msgpack \
--input zkml/examples/mnist/inp.msgpack \
--layers 4 --workers 2
# Real proofs
python tests/simple_distributed.py ... --realcd zkml
cargo test --test merkle_tree_test --test chunk_execution_test -- --nocaptureRuns on PRs to main/dev: builds zkml, runs tests (~3-4 min). GPU tests excluded to save costs.
- ZKML Paper (EuroSys '24) - Original zkml framework
- zkml Repository - Base framework this project extends
- zk-torch - Alternative approach using proof accumulation/folding.