Skip to content

Conversation

@alexdachin
Copy link

@alexdachin alexdachin commented Aug 17, 2025

Currently Trilium stores the contents of a File note type inside the SQLite database. While this is extremely nice for simplicity and portability, it can become problematic once you start uploading many reference files (for example some large quality images, large PDFs, videos etc). While in theory SQLite supports huge databases, storing many files in a single database:

  • makes the database file harder to backup
  • increases database lock times
  • increases the database corruption risk slightly

I propose storing big blobs in the filesystem and storing a reference to them in the database. The files are stored in the data directory to begin with for simplicity, but it also opens the gate for storing them in other places in the future (like s3 compatible storage systems). This keeps the SQLite database small and performant, enables faster incremental backups (like rsync), reduces the risk of database corruption and still allows Trilium to manage things like note hierarchy with cloning.

In order to not disturb the workflow of other people, this feature is disabled by default (so even large blobs are still stored internally in the database). To enable it you need to set TRILIUM_EXTERNAL_BLOB_STORAGE environment variable.

The threshold of storing notes externally vs internally is 100kb by default but can be changed with TRILIUM_EXTERNAL_BLOB_THRESHOLD environment variable. This way small blobs are read from the database more efficiently compared to storing all blobs in the filesystem.

Closes #6546

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 17, 2025
blobStorageService.deleteExternal(filePath);
}
} catch (error) {
// contentLocation column might not be present when applying older migrations
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried inlining some of these AbstractBeccaEntity in the 0233__migrate_geo_map_to_collection.ts migration, but some things are event based and the subscribers are using AbstractBeccaEntity as well.

The alternative would be to add a warning that people have to upgrade to the the previous Trilium version first and wait for migrations to complete, but I wanted to avoid that

@alexdachin alexdachin marked this pull request as draft August 17, 2025 11:14
@alexdachin alexdachin changed the title Allow external blobs Feat/Allow external blobs Aug 17, 2025
@alexdachin alexdachin changed the title Feat/Allow external blobs feat: allow external blobs Aug 17, 2025
@alexdachin alexdachin force-pushed the external-blobs branch 2 times, most recently from 6deb934 to f72f0fd Compare August 17, 2025 11:54
@alexdachin alexdachin marked this pull request as ready for review August 17, 2025 12:44
@perfectra1n
Copy link
Member

Can you explain more as to what it’s doing, how it works, and what it’d be useful for? Looks cool at first glance! :)

@alexdachin
Copy link
Author

Can you explain more as to what it’s doing, how it works, and what it’d be useful for? Looks cool at first glance! :)

Thanks, I updated the PR description.

There is some more context in the issue I linked in the description as well, but I tried to summarize everything here as well.

Copy link
Contributor

@eliandoran eliandoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a start, the implementation seems pretty good.

There is a bug when importing large files as code. Try importing this zip with external storage enabled and it will result in a 9 MB JSON file, but when accessing it it's empty.

Trace-20250708T195034.json.zip

@eliandoran eliandoran marked this pull request as draft August 19, 2025 18:19
@adoriandoran
Copy link
Member

I have noticed an increased risk of BLOB key collisions when the server is running on Windows. The BLOB key is case-sensitive, whereas the underlying file system is case-insensitive.

@adoriandoran
Copy link
Member

Once the data directory is relocated, the external BLOB paths break, since they are relative to Trilium’s directory. I suggest storing these paths in the database as relative to the “external-blobs” directory instead.

@alexdachin
Copy link
Author

I see, thank you all for the feedback! I'm away for a few days, but I'll have a look as soon as I get back

@perfectra1n perfectra1n changed the title feat: allow external blobs feat(db): allow external sqlite blobs Aug 23, 2025
@alexdachin alexdachin force-pushed the external-blobs branch 2 times, most recently from 6ca2066 to c0635e5 Compare August 27, 2025 17:11
@alexdachin
Copy link
Author

alexdachin commented Aug 27, 2025

Thank you again for checking this out!

I found out what the issue was with the large json file and fixed it.

Also, good points regarding case insensitive file systems, I switched to a random uuid instead and stored the relative path.

Do you mind checking it one more time? 🙏

@alexdachin alexdachin marked this pull request as ready for review August 31, 2025 18:29
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Aug 31, 2025
@alexdachin alexdachin requested a review from eliandoran August 31, 2025 18:29
@eliandoran
Copy link
Contributor

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant feature to allow storing large blobs in the filesystem instead of the database, which is a great step towards improving performance and manageability for large Trilium instances. The implementation is well-thought-out, especially the handling of migrations and configuration options. I've identified a few areas for improvement, primarily concerning file system operations' atomicity and performance. There are potential race conditions in file deletion that could lead to orphaned files, and some synchronous I/O operations could block the server's event loop. Additionally, there are opportunities to optimize database queries related to migration checks. Addressing these points will make the feature more robust and performant.

fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
}

fs.writeFileSync(absolutePath, content, { mode: 0o600 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

fs.writeFileSync is a synchronous I/O operation that will block the Node.js event loop until the file is fully written. For a server application, this can negatively impact performance and responsiveness, especially with large files. While changing this to asynchronous I/O (fs.promises.writeFile) would require a larger refactoring of the call chain to be async, it's the recommended approach for a high-performance server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point that async would be ideal. Currently the whole app is using better-sqlite3, which is synchronous. Before this change, blobs were stored in sqlite and read/written with blocking calls, so this is maintaining that same pattern just with filesystem storage instead.

Making this async would mean refactoring the call chain (and potentially switching to an async sqlite library), but I figured it shouldn't be in scope for this feature, what do you think?

Copy link
Contributor

@eliandoran eliandoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two bugs:

The first is a critical one (data loss):

  1. Import the Trace.json file. It will get stored as an external blob.
  2. Replace the entire text with a few characters to delete the external blob.
  3. Paste the Trace.json back in (or any big text to trigger saving).
  4. Restart the server.
  5. The server will restart with something like:
    Consistency issue fixed: Note '5zzWKjTnXz82' content was set to '' since it was null even though it is not deleted
    Consistency issue fixed: Note 'i2EbsDSQkdeb' content was set to '{}' since it was null even though it is not deleted
    
  6. Look at the Trace.json note, the content will be replaced with {}.

The second one involves uploading a video. Regardless of the video size, it will not be saved as an external blob.

In addition, please address the topics from the bot.

@eliandoran eliandoran marked this pull request as draft October 21, 2025 07:57
@capi
Copy link
Contributor

capi commented Oct 21, 2025

This makes backups a bit more complicated. Currently you can backup by doing a backup of the SQLite database, which you can do transaction save via e.g. VACCUM INTO, or other SQLite mechanisms.

With this you actually need to either stop the process before the backup or take a consistent filesystem snapshot.

That said, I really think this is a sensible thing to do. It should just be pointed out in the backup documentation.

@alexdachin alexdachin force-pushed the external-blobs branch 4 times, most recently from d87e022 to 95d96f1 Compare November 10, 2025 15:59
}

getAttachment(attachmentId: string, opts: AttachmentOpts = {}): BAttachment | null {
opts.includeContentLength = !!opts.includeContentLength;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next step would probably be removing this parameter since getting the content length is a very cheap operation now, but I didn't want to increase the size of this PR and make it harder to review.

This change can be in a follow-up PR

@alexdachin
Copy link
Author

Thanks a lot for the review @eliandoran and @capi!

That critical bug was a really good catch, sorry for missing it initially, I wasn't familiar with the consistency checks Trilium is doing. I couldn't replicate your issue regarding video uploads though, do you mind checking one more time? I tried different video formats and they all follow the threshold for me.

Really good point regarding backup, I updated the documentation to reflect that.

@alexdachin alexdachin marked this pull request as ready for review November 11, 2025 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflicts size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Store file notes in the filesystem rather than the database

5 participants