distributed-zkml

Extension of zkml for distributed proving using Ray, layer-wise partitioning, and Merkle trees.

⚠️ Status Note: This is an experimental research project. Also consider zk-torch.

Completed Milestones

Make Merkle root public: Add root to public values so next chunk can verify it Done
Complete proof generation: Connect chunk execution to actual proof generation (#8) Done
Ray-Rust integration: Connect Python Ray workers to Rust proof generation (#9) Done
GPU acceleration: ICICLE GPU backend for MSM operations (#10) Done - see GPU Acceleration

Status and Limitations

Project Status

This project implements a Ray-based distributed proving approach for zkml. It is experimental research code and should be considered useful for studying alternative approaches to zkML parallelization. The current status lacks formal security analysis and proof composition.

Known Limitations

Proof Composition: This implementation generates separate proofs per chunk. It does not implement recursive proof composition or aggregation. Verifiers must check O(n) proofs rather than O(1), limiting succinctness.

Trust Domain:

Merkle trees provide privacy for proof readers, not compute providers: The prover must know all weights and activations to generate a valid ZK proof. Merkle trees hide intermediate values from people reading the published proof, not from the compute provider during execution.
Multi-party security requires different trust domains: Security only applies when chunks are distributed across different trust domains (e.g., your servers + AWS), not just different AWS regions.
Comparison to TEE/FHE/MPC: Trusted Execution Environments (TEEs), Fully Homomorphic Encryption (FHE), or Multi-Party Computation (MPC) provide stronger privacy guarantees but at significant costs that are beyond the threshold of scalable AI applications.

When to Use This

Consider this project if:

Researching alternative zkml parallelization approaches
Need examples of Ray integration for cryptographic workloads
Studying Merkle-based privacy for intermediate computations
Building distributed halo2 proving (not zkML-specific)
Use case: You trust compute providers but want to limit public proof exposure, or model is partitioned across multiple non-colluding organizations

Use alternatives if:

Need to hide data from compute providers themselves → Requires TEEs/FHE/MPC
Need single aggregated proof → Consider zk-torch

Overview

This repository extends zkml (see ZKML paper) with distributed proving capabilities. zkml provides an optimizing compiler from TensorFlow to halo2 ZK-SNARK circuits.

distributed-zkml adds:

Layer-wise partitioning: Split ML models into chunks for parallel proving across GPUs via Ray
Merkle tree commitments: Hash intermediate activations with Poseidon; only publish root in proof
ICICLE GPU acceleration: Hardware-accelerated MSM operations

Comparison to zkml

Feature	zkml	distributed-zkml
Architecture	Single-machine	Distributed across GPUs
Scalability	Single GPU memory	Horizontal scaling
Privacy	Outputs public	Intermediate values hidden from proof readers via Merkle trees

Implementation

How Distributed Proving Works

Model Partitioning: Split model into chunks at layer boundaries
Parallel Execution: Each chunk runs on a separate GPU via Ray
Merkle Commitments: Hash intermediate outputs with Poseidon, only root is public
On-Chain: Publish only the Merkle root (O(1) public values vs O(n) without)

Note: Each chunk produces a separate proof. This implementation does not aggregate proofs into a single succinct proof. Verifiers must check all chunk proofs individually (O(n) verification time). For single-proof aggregation, see zk-torch's accumulation-based approach.

Model: 9 layers -> 3 chunks
  Chunk 1: Layers 0-2 -> GPU 1 -> Hash A
  Chunk 2: Layers 3-5 -> GPU 2 -> Hash B
  Chunk 3: Layers 6-8 -> GPU 3 -> Hash C

Merkle Tree:
        Root (public)
       /    \
    Hash(AB) Hash C
    /    \
 Hash A  Hash B

Trust Boundaries

What Merkle Trees Provide

Scenario	Hidden?	Explanation
Proof readers reconstructing weights via model inversion	Yes	Intermediate activations are hashed, not exposed in proof
Compute provider seeing weights during execution	No	Provider must have weights to generate ZK proof
Compute provider seeing intermediate activations during execution	No	Provider computes them

Key insight: Merkle trees hide intermediate values from people reading the published proof, not from the compute provider during execution. The prover must know all values to generate a valid ZK proof.

Multi-Party Proving and Trust Domains

Security depends on trust domains, not physical location:

Setup	Trust Domains	What's Private
Single AWS account (any region)	1	Nothing from AWS — they control all regions
Your servers + AWS	2	Your portion's weights never sent to AWS
AWS + Google + Azure	3	Each provider sees only their chunk (assuming non-collusion)

Multi-party benefit: If model is partitioned across different trust domains (e.g., your servers + AWS), no single party has the full model. Combined with Merkle trees, this provides layered privacy:

Partitioning → limits what any single provider can access
Merkle trees → limits what proof readers can observe

Comparison with ZKTorch

Aspect	distributed-zkml	ZKTorch
Scaling strategy	Horizontal (more machines via Ray)	Vertical (proof compression via Mira)
Final output	N separate proofs	1 accumulated proof
Verification cost	O(N) proofs to verify	O(1) single proof
Intermediate privacy	Merkle trees hide from proof readers	Exposed in proof
Base system	halo2 (~30M param limit)	Custom pairing-based (6B params tested)

These approaches are orthogonal — could theoretically combine Ray parallelism with Mira accumulation.

Structure

distributed-zkml/
├── python/                 # Python wrappers for Rust prover
├── tests/                  # Distributed proving tests
└── zkml/                   # zkml (modified for Merkle + chunking)
    ├── src/bin/prove_chunk.rs
    └── testing/

Requirements

Docker (Recommended)

Just Docker and Docker Compose. Everything else is in the container.

Native Build

Dependency	Notes
Rust (nightly)	Install via rustup
Python >=3.10
pip	`pip install -e .`
Build tools	Linux: `build-essential pkg-config libssl-dev`; macOS: Xcode CLI

Python deps (installed via pip install -e .):

ray[default]>=2.31.0
msgpack, numpy

Optional: NVIDIA GPU + CUDA 12.x + ICICLE backend for GPU acceleration

Quick Start

Docker

docker compose build dev
docker compose run --rm dev
# Inside container:
cd zkml && cargo test --test merkle_tree_test -- --nocapture

Native

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build
cd zkml && rustup override set nightly && cargo build --release && cd ..

# Python deps
pip install -e .

GPU Acceleration

Uses ICICLE for GPU-accelerated MSM (Multi-Scalar Multiplication).

Requirements

NVIDIA GPU (tested on A10G/T4, compatible with A100/H100)
CUDA 12.x
Ubuntu 20.04+

Setup

# 1. Download ICICLE backend (Ubuntu 22.04 - use ubuntu20 for 20.04)
curl -L -o /tmp/icicle.tar.gz \
  https://github.com/ingonyama-zk/icicle/releases/download/v3.1.0/icicle_3_1_0-ubuntu22-cuda122.tar.gz

# 2. Install
mkdir -p ~/.icicle && tar -xzf /tmp/icicle.tar.gz -C /tmp && cp -r /tmp/icicle/lib/backend ~/.icicle/

# 3. Set env var (add to ~/.bashrc)
export ICICLE_BACKEND_INSTALL_DIR=~/.icicle/backend

# 4. Build with GPU
cd zkml && cargo build --release --features gpu

# 5. Verify
cargo test --test gpu_benchmark_test --release --features gpu -- --nocapture

Expected output:

Registered devices: ["CUDA", "CPU"]
Successfully set CUDA device 0

Benchmarks (T4)

Size	GPU MSM Time	Throughput
2^14 (16K)	6.5ms	2.5M pts/sec
2^16 (65K)	7.9ms	8.3M pts/sec
2^18 (262K)	13ms	19.5M pts/sec

FFT/NTT Notes

Measure FFT time: HALO2_FFT_STATS=1
GPU NTT (experimental): HALO2_USE_GPU_NTT=1 - currently slower due to conversion overhead

Testing

Distributed Proving

# Simulation (fast)
python tests/simple_distributed.py \
    --model zkml/examples/mnist/model.msgpack \
    --input zkml/examples/mnist/inp.msgpack \
    --layers 4 --workers 2

# Real proofs
python tests/simple_distributed.py ... --real

Rust Tests

cd zkml
cargo test --test merkle_tree_test --test chunk_execution_test -- --nocapture

CI

Runs on PRs to main/dev: builds zkml, runs tests (~3-4 min). GPU tests excluded to save costs.

References

ZKML Paper (EuroSys '24) - Original zkml framework
zkml Repository - Base framework this project extends
zk-torch - Alternative approach using proof accumulation/folding.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
python		python
src		src
tests		tests
zkml		zkml
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml
uv.lock		uv.lock

ray-project/distributed-zkml

Folders and files

Latest commit

History

Repository files navigation

distributed-zkml

Completed Milestones

Table of Contents

Status and Limitations

Project Status

Known Limitations

When to Use This

Overview

Comparison to zkml

Implementation

How Distributed Proving Works

Trust Boundaries

What Merkle Trees Provide

Multi-Party Proving and Trust Domains

Comparison with ZKTorch

Structure

Requirements

Docker (Recommended)

Native Build

Quick Start

Docker

Native

GPU Acceleration

Requirements

Setup

Benchmarks (T4)

FFT/NTT Notes

Testing

Distributed Proving

Rust Tests

CI

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages