[feat] integrate smtforest, avoid ser/de of full account/vault data in database #1394

drahnr · 2025-11-28T10:40:42Z

Part 2/3 of #1185 and lays the ground for #1354

⚠️ The PR is rather large, a lot of the changes affect large pieces of code across.

Intent

Provide the baseline for cheap querying partial storage maps, but not implement it
Prepare for potential deprecation of AccountInfo

core changes

Remove the vault / storage_map BLOB entries from the accounts table.
Add SmtForest and integrate into apply_block and State::load

significant changes in the following files:

crates/store/src/db/schema.rs introduces account_storage_headers and removes storage (the full serialized account storage) from accounts table
crates/store/src/state.rs / fn State::apply_block now updates the database, but also the lookup table of roots for the SmtForest and the entries in the forest itself, indirect lookup tables

out of scope

usage of the LargeSmtForest for partial storage maps - [devops] Deploy without nuking db #670
individual of partial storage map queries - Implement support for retrieving subsets of storage maps in GetAccountProof endpoint #1354
cleanup for both DB and (Large|)SmtForest - Add cleanup routine for outdated historical storage entries #1355

how to review

Take the existing TODOs into consideration, if they make sense. This will be the follow up PR.

crates/proto/src/domain/account.rs

crates/store/src/db/models/queries/accounts.rs

drahnr · 2025-12-02T16:54:04Z

crates/store/src/state.rs

+
+    /// Updates storage map SMTs in the forest for changed accounts
+    #[allow(clippy::too_many_lines)]
+    async fn update_storage_maps_in_forest(


Reviewing this one would be very helpful

It looks okay with a light pass; I'll try more tomorrow.

Mirko-von-Leipzig

Mostly questions; no real issues found as yet

crates/store/src/db/migrations/2025062000000_setup/down.sql

crates/store/src/state.rs

Mirko-von-Leipzig · 2025-12-04T16:11:21Z

crates/store/src/state.rs

+
+    /// Updates storage map SMTs in the forest for changed accounts
+    #[allow(clippy::too_many_lines)]
+    async fn update_storage_maps_in_forest(


It looks okay with a light pass; I'll try more tomorrow.

crates/proto/src/domain/account.rs

bobbinth · 2025-12-10T07:45:18Z

crates/store/src/db/migrations/2025062000000_setup/up.sql

    account_commitment                      BLOB NOT NULL,
    code_commitment                         BLOB,
-    storage                                 BLOB,
-    vault                                   BLOB,


Question: what is the benefit of storing vault and storage headers in separate tables? I was thinking this would work as follows:

vault becomes just vault_root. This should be strictly better than introducing the account_vault_headers table.

storage becomes storage_header - a blob with serialized storage header. Depending on how we use this data, this may or may not be a good idea. Or we may also need to add something like storage_commitment.

I don't think using the unversioned, implementation defined serialization format and blob fields makes sense/ does ask for future trouble. Given the number of fields, it seemed logical to me to move to a separate table. vault followed in symmetry, but given it's only a single value I can also just inline it into the accounts table again.

My bigger concern was on how to match the tables; is (account_id, block_num) a good primary? Good enough, good in absolute terms, maybe, so I think adding a storage_commitment-columns as part of the accounts table might make sense, similar to account_code.

I don't think using the unversioned, implementation defined serialization format and blob fields asks for future trouble. Given the number of fields, it seemed logical to me to move to a separate table. vault followed in symmetry, but given it's only a single value I can also just inline it into the accounts table again.

This makes sense, but I do think splitting this into multiple tables comes with quite a bit of extra complexity (querying, insertion, cleanup) and performance overhead (the datasets will be pretty large, and I'm assuming joining tables would not be negligible).

AFICT, putting vault_root into accounts comes without any downsides - so, I think we should do it.

For storage header, I agree that using serialization format defined in miden-base introduces some risk, but I think a simpler way to handle it is to write custom serialization/deserialization for AccountStorageHeader here. This would give us the same amount of control, but would reduce complexity and improve performance.

I don't think we need to write this custom serialization in this PR. I'd do the following:

Replace the table with storage_header fields that would be a simple serialization of AccountStorageHeader.

Create a follow-up issue to write the custom serialization/deserialization logic for AccountStorageHeader.

bobbinth · 2025-12-10T07:50:27Z

crates/proto/src/domain/account.rs

+/// Represents a list of assets, if the number of assets is reasonably small, which
+/// is currently set to 1000 for no particular reason.


Currently, every asset is 32 bytes - though, we are likely to increase this to 64 bytes in the future. So, 1K assets would be about 64KB of data. It feels a bit high - so, maybe we should reduce this, but I don't have good rationale to what exactly.

bobbinth · 2025-12-10T07:53:34Z

crates/proto/src/domain/account.rs

+        let too_many_assets = entries.len() > Self::MAX_RETURN_ENTRIES;
+
+        if too_many_assets {
+            return Ok(Self::too_many());
+        }


Not for this PR, but we should probably have a way not to read all assets from the database in case the number of assets is too big. So, as a future optimization, we could also store num_assets in the accounts table (next to vault_root) - though, it will require updating this filed as the number of assets changes, which will not be trivial.

It might be easier to add a subquery with LIMIT (n+1) clause where n is our actual desired limit to mitigate large~ish memory overhead.

bobbinth · 2025-12-10T07:56:41Z

crates/proto/src/domain/account.rs

+/// ## Future Enhancement (TODO)
+///
+/// Currently, when `too_many_entries = true`, we return an empty list. A future improvement
+/// would return a **partial SMT** with:
+/// - A subset of entries (e.g., most frequently accessed)
+/// - Merkle proofs for those entries
+/// - Inner node commitments
+///
+/// This would allow clients to verify partial data cryptographically while still
+/// signaling that more data exists. The reason this matters: if all leaf values are
+/// included, one can reconstruct the entire SMT; if even one is missing, one cannot.
+/// By providing proofs, we enable verification of partial data.


I'm not sure I'd put this into comments - but we should create an issue for this (unless we already have one).

Overall, I don't think we'll proactively try to return a part of the map, but rather would return a partial map when the user requests values for specific keys.

bobbinth · 2025-12-10T07:59:14Z

crates/proto/src/domain/account.rs

+        let too_many_entries = map_entries.len() > Self::MAX_RETURN_ENTRIES;
+        let map_entries = if too_many_entries { Vec::new() } else { map_entries };


Similar to one of the above comments: we should probably optimize this in the future to avoid reading full maps from the database when there are too many entries - but that would be even more tricky than doing this for the asset vault.

crates/store/src/db/mod.rs

bobbinth · 2025-12-10T08:10:11Z

crates/store/src/db/models/queries/accounts.rs

+/// This reconstructs the `AccountHeader` by joining multiple tables:
+/// - `accounts` table for `account_id`, `nonce`, `code_commitment`
+/// - `account_vault_headers` table for `vault_root`
+/// - `account_storage_headers` table for storage slot commitments (to compute `storage_commitment`)


Related to some of the above comments: ideally, we'd be able to read the full account header just from the accounts. This should simplify queries like this - and I'm not seeing downsides to doing this.

bobbinth · 2025-12-10T08:17:18Z

crates/store/src/db/models/queries/accounts.rs

+/// Queries the account code for a specific account at a specific block number.
+///
+/// Returns `None` if:
+/// - The account doesn't exist at that block
+/// - The account has no code (private account or account without code commitment)
+///
+/// # Arguments
+///
+/// * `conn` - Database connection
+/// * `account_id` - The account ID to query
+/// * `block_num` - The block number at which to query the account code
+///
+/// # Returns
+///
+/// * `Ok(Some(Vec<u8>))` - The account code bytes if found
+/// * `Ok(None)` - If account doesn't exist or has no code
+/// * `Err(DatabaseError)` - If there's a database error
+pub(crate) fn select_account_code_at_block(


Would be awesome to add SQL statements describing these queries (as we do for some similar methods).

bobbinth · 2025-12-10T08:24:01Z

crates/store/src/db/models/queries/accounts.rs

+    Ok(result)
+}
+
+/// Queries the account header for a specific account at a specific block number.


I'm not sure this description is correct. I think what we are getting here is the first account header with block number which is smaller than or equal to the block_num parameter - right?

bobbinth · 2025-12-10T08:28:35Z

crates/store/src/db/models/queries/accounts.rs

+fn compute_storage_commitment(slot_commitments: &[Word]) -> Word {
+    use miden_objects::crypto::hash::rpo::Rpo256;
+
+    let elements: Vec<Felt> = slot_commitments.iter().flat_map(|w| w.iter()).copied().collect();
+
+    Rpo256::hash_elements(&elements)
+}


We should try to figure out how to use AccountStorageHeader here - otherwise this will be quite brittle.

For example, we could store the result of AccountStorageHeader::to_vec() in accounts.storage_header, and then, we could deserialize the bytes into AccountStorageHeader and get the commitment from it directly.

bobbinth · 2025-12-10T08:33:35Z

crates/store/src/db/models/queries/accounts.rs

+    let raw: Vec<(Vec<u8>, Option<Vec<u8>>)> = SelectDsl::select(
+        t::table
+            .filter(t::account_id.eq(&account_id_bytes))
+            .filter(t::block_num.le(block_num_sql))
+            .order(t::block_num.desc())
+            .limit(1),
+        (t::vault_key, t::asset),
+    )
+    .load(conn)?;


Would this not always return just 1 entry (because of LIMIT 1)? I think we may need to get all entries with block_num smaller than or equal to the passed in block number and then perform filtering manually - but maybe there is a better way to do it.

bobbinth · 2025-12-10T08:35:24Z

crates/store/src/db/models/queries/accounts.rs

+    // Query latest storage headers for this account
+    let headers: Vec<AccountStorageHeaderRaw> =
+        SelectDsl::select(t::table, AccountStorageHeaderRaw::as_select())
+            .filter(t::account_id.eq(&account_id_bytes).and(t::is_latest.eq(true)))
+            .order(t::slot_index.asc())
+            .load(conn)?;


As mentioned in the previous comments, I would probably read the storage header directly from the accounts table.

crates/store/src/db/models/queries/accounts.rs

bobbinth · 2025-12-10T08:45:58Z

crates/store/src/db/models/queries/accounts.rs

+    // Query all entries for this slot at or before the given block
+    let raw: Vec<(Vec<u8>, Vec<u8>)> = SelectDsl::select(t::table, (t::key, t::value))
+        .filter(
+            t::account_id
+                .eq(&account_id_bytes)
+                .and(t::slot.eq(slot_sql))
+                .and(t::block_num.le(block_num_sql)),
+        )
+        .load(conn)?;
+
+    // Parse entries
+    let entries: Vec<(Word, Word)> = raw
+        .into_iter()
+        .map(|(k, v)| Ok((Word::read_from_bytes(&k)?, Word::read_from_bytes(&v)?)))
+        .collect::<Result<Vec<_>, DatabaseError>>()?;


Wouldn't this result in potentially multiple entries being read from the database? This by itself is fine (or at least I don't see a way to avoid it), but I think we do need to filter the entries here to make sure we don't have more than one entry per block (i.e., we should select the last entry for each key).

bobbinth

Looks good! Thank you! I left some comments above, but the main things are:

It is not clear to me if there is much value in having account_storage_headers and account_vault_headers tables. I think we can store the headers directly in the accounts table which should simplify the code and make querying more efficient.
I think some of the queries may not be working correctly - specifically the ones that try to reconstruct account vault or storage map at a given block. The main reason here is that I think we need to read more data from the DB and then filter it in memory to remove potential duplicates. But it is also possible that I missed something.

bobbinth · 2025-12-10T09:01:16Z

crates/store/src/state.rs

+    async fn update_vaults_in_forest(
+        &self,
+        changed_account_ids: &[AccountId],
+        block_num: BlockNumber,
+    ) -> Result<(), ApplyBlockError> {


Would it be possible to move the InnerForest struct into a separate file and then also attach these methods to that struct? For example, something like:

impl InnerForest { pub async fn update_vaults( &self, db: Arc<Db>, changed_account_ids: &[AccountId], block_num: BlockNumber, ) -> Result<..> { ... } }

drahnr mentioned this pull request Dec 1, 2025

Implement the history overlay mechanism 0xMiden/crypto#679

Merged

5 tasks

drahnr force-pushed the bernhard-integrate-smtforest branch 2 times, most recently from 7980bed to 771f6d7 Compare December 2, 2025 11:16

drahnr added 12 commits December 2, 2025 16:22

for account queries, now return partials too

4ff0970

drop all tables as part of migration

5ee1043

externalize storage and vault blobs to separate tables

8eca359

trailing .

e7f17ed

smt forest

c8b43ab

changset, should go away after rebase

19164be

improve

8eb49af

TODO and deprecation

6725461

account queries

ee65a88

yes

741df6f

why

6a64077

y

9d5806e

drahnr force-pushed the bernhard-integrate-smtforest branch from 771f6d7 to b2aa54b Compare December 2, 2025 15:22

y

2964a93

drahnr force-pushed the bernhard-integrate-smtforest branch from b2aa54b to 2964a93 Compare December 2, 2025 15:54

drahnr commented Dec 2, 2025

View reviewed changes

crates/proto/src/domain/account.rs Outdated Show resolved Hide resolved

drahnr commented Dec 2, 2025

View reviewed changes

crates/store/src/db/models/queries/accounts.rs Show resolved Hide resolved

drahnr commented Dec 2, 2025

View reviewed changes

crates/store/src/db/models/queries/accounts.rs Outdated Show resolved Hide resolved

drahnr commented Dec 2, 2025

View reviewed changes

crates/store/src/db/models/queries/accounts.rs Show resolved Hide resolved

drahnr commented Dec 2, 2025

View reviewed changes

crates/store/src/db/models/queries/accounts.rs Outdated Show resolved Hide resolved

drahnr commented Dec 2, 2025

View reviewed changes

crates/store/src/db/models/queries/accounts.rs Show resolved Hide resolved

review comments

9416a63

drahnr commented Dec 2, 2025

View reviewed changes

sanitize comments

ccc2d63

This was referenced Dec 2, 2025

refactor: unify get_account_details and get_account_proof[s] #1385

Open

fix: make get account proof retrieve latest known state #1422

Merged

Mirko-von-Leipzig reviewed Dec 4, 2025

View reviewed changes

remove dead code

8897939

drahnr marked this pull request as ready for review December 5, 2025 00:13

add test

7d7fefc

drahnr requested a review from sergerad December 5, 2025 08:25

add block exists helper

0f53fa9

sergerad reviewed Dec 8, 2025

View reviewed changes

crates/proto/src/domain/account.rs Show resolved Hide resolved

drahnr added 4 commits December 9, 2025 17:35

Merge remote-tracking branch 'origin' into bernhard-integrate-smtforest

7400134

simplify

0e2d871

better docs

f78103e

split long function in State

ea05b01

sergerad approved these changes Dec 9, 2025

View reviewed changes

drahnr added 2 commits December 9, 2025 20:23

better

3cd457a

clippy et al

c8f0eb1

bobbinth reviewed Dec 10, 2025

View reviewed changes

crates/store/src/db/mod.rs Outdated Show resolved Hide resolved

bobbinth reviewed Dec 10, 2025

View reviewed changes

crates/store/src/db/models/queries/accounts.rs Show resolved Hide resolved

bobbinth reviewed Dec 10, 2025

View reviewed changes

review

ed7224e

		/// Represents a list of assets, if the number of assets is reasonably small, which
		/// is currently set to 1000 for no particular reason.

		let too_many_entries = map_entries.len() > Self::MAX_RETURN_ENTRIES;
		let map_entries = if too_many_entries { Vec::new() } else { map_entries };

[feat] integrate smtforest, avoid ser/de of full account/vault data in database #1394

Are you sure you want to change the base?

[feat] integrate smtforest, avoid ser/de of full account/vault data in database #1394

Conversation

drahnr commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Intent

core changes

significant changes in the following files:

out of scope

how to review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drahnr Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drahnr Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drahnr Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drahnr commented Nov 28, 2025 •

edited

Loading

drahnr Dec 2, 2025 •

edited

Loading

drahnr Dec 10, 2025 •

edited

Loading

drahnr Dec 10, 2025 •

edited

Loading