Skip to content

Conversation

@padraic-shafer
Copy link
Contributor

@padraic-shafer padraic-shafer commented Oct 30, 2025

Resolves #1153.

Adds an optional persist query parameter to PUT /array/... and PATCH /array/... routes.

When True (default) the data is written as usual; otherwise the data is only streamed and not persisted.

This option was also added to the write(), write_block(), and patch() methods on the DaskArrayClient.

Added new tests for PUT /array/full and PATCH /array/full.

Copy link
Contributor Author

@padraic-shafer padraic-shafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to proceed with a test for write_block. It's used sparsely at the moment.

@padraic-shafer padraic-shafer marked this pull request as ready for review October 31, 2025 01:21
Copy link
Contributor Author

@padraic-shafer padraic-shafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not too far out of scope, I also updated the websockets tests to receive all messages, with an extra read for the schema.

@padraic-shafer padraic-shafer force-pushed the array-no-persist branch 2 times, most recently from 6db03e5 to e387a13 Compare October 31, 2025 02:13
@padraic-shafer
Copy link
Contributor Author

I removed the locally edited docker-compose.yml that should not be part of this PR.

@danielballan
Copy link
Member

I'm not sure how to proceed with a test for write_block. It's used sparsely at the moment.

The write_block method is used by c.write_array(arr) when arr has a chunks attribute. So one way to exercise that would be to use a dask.array or an h5py.Dataset as arr in a test. Maybe parameterizing over numpy array versus [pick your favorite chunked array].

def write_block(x, block_id, client):
if len(block_id):
client.write_block(x, block=block_id)
return x

@danielballan
Copy link
Member

My previous comment does prompt an interesting design question: what to do with tiled.client.container:Container.write_array. That is what most users interact with today, and it's convenience method around:

x = Container.new(...)  # POST
x.write(arr)  # PUT
# (or x.write_block if chunked)

Above I had in mind that persist would b exposed on write_array and passed through. But I wonder if this could be too confusing. The POST will always persist. (Without a POST, there's no id to put in Redis, and no schema to provide to subscribers.) So c.write_array(..., persist=False) would mean, "Create a node around this data, and broadcast these initial values, but leave the persist data "empty". That may be too non-intutitive.

Maybe it's better to guide users who want to use this feature to call new(...) directly and then, separately, write or write_block with persist=False.

@padraic-shafer
Copy link
Contributor Author

padraic-shafer commented Oct 31, 2025

Do these changes make the tests more readable or less clear?
34e26f8
(12a3256, after rebase)

@genematx
Copy link
Contributor

I'm probably misunderstanding this, but what happens to the data if the user sends several patch array requests with mixed persistence (i.e. some of them persisted, then a few -- not, then persisted again...). Is there a chance to end up with inconsistent structures and data between the cache and storage?

I've read this comment in the issue

If we stipulate that give data updates can be transient but the structures they related to are always persisted in the catalog

but it's not clear to me how it works here.

@danielballan
Copy link
Member

Capturing a brief discussion:

The question of how to handle transient writes which change the shape of the array seems subtle and worth some care. But we can implement PUT, which replaces all the data with change in structure. This would be sufficient to enable the first motivating use case.

We can even implement PATCH conditional on the query parameter extend being false. This, likewise, does not alter the structure.

We can tackle separately, later, the question of whether and how to support transient writes that alter the structure.

@padraic-shafer
Copy link
Contributor Author

padraic-shafer commented Nov 7, 2025

In b6f5c2e (f56116d, after rebase) I've trivialized the PATCH test with persist to create the same results as the PUT test. I think this is ok because it emphasizes persist in these tests, whereas PATCH behavior is already tested elsewhere.

@padraic-shafer
Copy link
Contributor Author

I still plant to add:

  • a NotImplementedError for PATCH with extend + that test.
  • some docs for this new feature

@padraic-shafer
Copy link
Contributor Author

padraic-shafer commented Nov 8, 2025

I initially tried parameterizing test_updates_persist_append with exception testing in 59a3958 (7513a9e, after rebase), but the test hung when capturing the exception within the WS context. This was possibly a resource tear down issue.

I've now split test_updates_append_without_persist into its own test.

@padraic-shafer
Copy link
Contributor Author

padraic-shafer commented Nov 8, 2025

In a24a8c4 (46dd4e8, after rebase), added an optional list of self-serializing types to safe_json_dump(). This seemed like the cleanest option to avoid a circular import between tiled.utils, tiled.ndslice, and tiled.type_aliases.

@padraic-shafer
Copy link
Contributor Author

This PR is ready for review.

Copy link
Member

@danielballan danielballan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems solid! I love the docs.

Just two comments on error handing: one here in the review and another (accidentally) submitted as a standalone comment.

@danielballan danielballan merged commit 0388f21 into bluesky:main Nov 10, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support transient data that is streamed but not persisted

3 participants