Skip to content

Conversation

@alanshaw
Copy link
Member

@alanshaw alanshaw commented Sep 5, 2025

Filepack is an archive format for transferring and storing content addressed data.

📚 Preview

🎥 Demo

Copy link
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, I love the filepack idea. to the extent this is a proposal for a thing we could do, I'm LGTM.

Also, I think we might go even farther and I'll make an RFC with some ideas.

What I'd like like to do now though is talk through the implications of implementation and if/when we'd actually do the work to put this in production.

Is it true to say we could ship Filepack in the upload-service and it would "just work" in freeway? I think so no? perhaps further optimization needed but it would work as is?

And then we'd need to modify guppy if we wanted things to be standard.

What I am ultimately wondering is: do we push on implementation now, given all that we have to implement in the next few months? It is another thing to fit in. Or do we wait for a lull after warm storage ships and fit in UCAN 1.0 / tech debt

@alanshaw
Copy link
Member Author

alanshaw commented Sep 8, 2025

What I'd like like to do now though is talk through the implications of implementation and if/when we'd actually do the work to put this in production.

The only adverse implication I'm aware of is putting this data into filecoin, which is supposed to be content address data. i.e. it expects a CAR (well, the boost code expects the first few bytes of deal data to be a CAR). However we already work around that by prepending an empty CAR piece to every aggregate.

Is it true to say we could ship Filepack in the upload-service and it would "just work" in freeway? I think so no? perhaps further optimization needed but it would work as is?

Yes. In the demo (linked in the PR description) I showed retrieving data from freeway.

There would be a piece of work to optimise retrieval yes but it's compatible as is.

And then we'd need to modify guppy if we wanted things to be standard.

IIRC the tracks are currently being laid in guppy. It might be preferable to implement while everyone's heads are "in the game" as it were.

What I am ultimately wondering is: do we push on implementation now, given all that we have to implement in the next few months? It is another thing to fit in. Or do we wait for a lull after warm storage ships and fit in UCAN 1.0 / tech debt

Good question. What's nice about this is that we can split the work in two - writes and reads. We could do the writes work now and the reads work a lot later...and the reads work would still yield a speed up for the older data that we stored with the new format.

That said, it's all non-critical optimization work. I'd prefer we focused on more mission critical tasks in general, but the writes implementation would probably affect only guppy at this point (since we have an implementation in the JS client already) so it's somewhat a question of whether @Peeja would be comfortable thrashing out the implementation in guppy while she's already knee deep in it...

The other thing to consider is that guppy is targeted to upload a substantial amount of data to the network, and IMO it would be preferable to enable an optimised retreival for that data, even if it is not realized immediately. We cannot "go back" and enable this optimization for all that data.

@BravoNatalie
Copy link
Contributor

I know it’s already in the demo video, but could you also share the tldr you presented here?

@alanshaw
Copy link
Member Author

alanshaw commented Sep 8, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants