ggml-rpc: Add graceful error handling for graph compute operations #18933

mfahsold · 2026-01-19T16:12:27Z

Summary

The current RPC implementation crashes the server with GGML_ASSERT when ggml_backend_graph_compute returns a non-success status. This causes distributed inference setups to fail completely when a single worker encounters a temporary error (memory pressure, backend issues, etc.).

Changes

Added response structs for RPC_CMD_GRAPH_COMPUTE and RPC_CMD_GRAPH_RECOMPUTE:
- rpc_msg_graph_compute_rsp with int32_t status
- rpc_msg_graph_recompute_rsp with int32_t status
Server-side changes:
- Replaced GGML_ASSERT(status == GGML_STATUS_SUCCESS) with graceful error logging
- Server now sends the actual ggml_status back to the client via RPC response
- Server continues operating instead of crashing on compute failures
Client-side changes:
- ggml_backend_rpc_graph_compute now receives and returns the actual status from the server
- Clients can properly handle non-success status codes (retry, failover, etc.)

Before

[RPC] Worker disconnects or compute fails
-> GGML_ASSERT crashes the server
-> All connected clients lose their sessions
-> Manual restart required

After

[RPC] Worker disconnects or compute fails
-> Error is logged: "[rpc_server::graph_compute] graph compute failed with status X"
-> Status propagated to client
-> Client can handle error appropriately
-> Server continues operating

Related Issues

Fixes Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it #11929 (RPC does not handle errors in compute)
Fixes ggml-rpc.cpp:682 GGML_ASSERT(status) failed after a few chats on windows 11 gpustack/gpustack#1178 (Related GPUStack issue)

Testing

Tested in a Kubernetes distributed inference setup with 2 RPC workers:

Verified that worker disconnection no longer crashes the coordinator
Verified that compute errors are properly logged and propagated
Verified that inference completes successfully under normal operation

The current RPC implementation crashes the server with GGML_ASSERT when ggml_backend_graph_compute returns a non-success status. This causes distributed inference setups to fail completely when a single worker encounters a temporary error (memory pressure, backend issues, etc.). This patch: 1. Adds rpc_msg_graph_compute_rsp and rpc_msg_graph_recompute_rsp structs 2. Replaces GGML_ASSERT with graceful error logging on the server side 3. Propagates ggml_status back to the client via RPC response 4. Allows clients to handle errors appropriately (retry, failover, etc.) Fixes: ggml-org#11929 Fixes: gpustack/gpustack#1178

rgerganov

Returning a response from graph_compute requires a network round-trip which hurts performance. I don't think there is a legit use case where one graph_compute fails and then everything get backs to normal.

mfahsold · 2026-01-20T13:55:39Z

Hello Radoslav,I have a heterogenous Cluster with 8 nodes (x86, arm, Nvidia, AMD), without graceful Handling I was not able to get the nodes to communicate and get a healthy state for distributed and sharded inference. I did not see any unusual or bloated Network Overhead.Best MatthiasAm 20.01.2026 14:44 schrieb Radoslav Gerganov ***@***.***>: @rgerganov commented on this pull request. Returning a response from graph_compute requires a network round-trip which hurts performance. I don't think there is a legit use case where one graph_compute fails and then everything get backs to normal. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

mfahsold requested a review from rgerganov as a code owner January 19, 2026 16:12

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jan 19, 2026

loci-dev mentioned this pull request Jan 19, 2026

UPSTREAM PR #18933: ggml-rpc: Add graceful error handling for graph compute operations auroralabs-loci/llama.cpp#972

Open

rgerganov reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-rpc: Add graceful error handling for graph compute operations #18933

ggml-rpc: Add graceful error handling for graph compute operations #18933

mfahsold commented Jan 19, 2026 •

edited

Loading

Uh oh!

rgerganov left a comment

Uh oh!

mfahsold commented Jan 20, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggml-rpc: Add graceful error handling for graph compute operations #18933

Are you sure you want to change the base?

ggml-rpc: Add graceful error handling for graph compute operations #18933

Conversation

mfahsold commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Before

After

Related Issues

Testing

Uh oh!

rgerganov left a comment

Choose a reason for hiding this comment

Uh oh!

mfahsold commented Jan 20, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mfahsold commented Jan 19, 2026 •

edited

Loading