Skip to content

Conversation

@drahnr
Copy link
Contributor

@drahnr drahnr commented Nov 28, 2025

Part 2/3 of #1185 and lays the ground for #1354

⚠️ The PR is rather large, a lot of the changes affect large pieces of code across.

Intent

  • Provide the baseline for cheap querying partial storage maps, but not implement it
  • Prepare for potential deprecation of AccountInfo

core changes

  1. Remove the vault / storage_map BLOB entries from the accounts table.
  2. Add SmtForest and integrate into apply_block and State::load

significant changes in the following files:

  • crates/store/src/db/schema.rs introduces account_storage_headers and removes storage (the full serialized account storage) from accounts table
  • crates/store/src/state.rs / fn State::apply_block now updates the database, but also the lookup table of roots for the SmtForest and the entries in the forest itself, indirect lookup tables

out of scope

how to review

Take the existing TODOs into consideration, if they make sense. This will be the follow up PR.

@drahnr drahnr force-pushed the bernhard-integrate-smtforest branch 2 times, most recently from 7980bed to 771f6d7 Compare December 2, 2025 11:16
@drahnr drahnr force-pushed the bernhard-integrate-smtforest branch from 771f6d7 to b2aa54b Compare December 2, 2025 15:22
@drahnr drahnr force-pushed the bernhard-integrate-smtforest branch from b2aa54b to 2964a93 Compare December 2, 2025 15:54

/// Updates storage map SMTs in the forest for changed accounts
#[allow(clippy::too_many_lines)]
async fn update_storage_maps_in_forest(
Copy link
Contributor Author

@drahnr drahnr Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this one would be very helpful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks okay with a light pass; I'll try more tomorrow.

Copy link
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions; no real issues found as yet


/// Updates storage map SMTs in the forest for changed accounts
#[allow(clippy::too_many_lines)]
async fn update_storage_maps_in_forest(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks okay with a light pass; I'll try more tomorrow.

@drahnr drahnr marked this pull request as ready for review December 5, 2025 00:13
@drahnr drahnr requested a review from sergerad December 5, 2025 08:25
account_commitment BLOB NOT NULL,
code_commitment BLOB,
storage BLOB,
vault BLOB,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what is the benefit of storing vault and storage headers in separate tables? I was thinking this would work as follows:

  • vault becomes just vault_root. This should be strictly better than introducing the account_vault_headers table.
  • storage becomes storage_header - a blob with serialized storage header. Depending on how we use this data, this may or may not be a good idea. Or we may also need to add something like storage_commitment.

Copy link
Contributor Author

@drahnr drahnr Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using the unversioned, implementation defined serialization format and blob fields makes sense/ does ask for future trouble. Given the number of fields, it seemed logical to me to move to a separate table. vault followed in symmetry, but given it's only a single value I can also just inline it into the accounts table again.

My bigger concern was on how to match the tables; is (account_id, block_num) a good primary? Good enough, good in absolute terms, maybe, so I think adding a storage_commitment-columns as part of the accounts table might make sense, similar to account_code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using the unversioned, implementation defined serialization format and blob fields asks for future trouble. Given the number of fields, it seemed logical to me to move to a separate table. vault followed in symmetry, but given it's only a single value I can also just inline it into the accounts table again.

This makes sense, but I do think splitting this into multiple tables comes with quite a bit of extra complexity (querying, insertion, cleanup) and performance overhead (the datasets will be pretty large, and I'm assuming joining tables would not be negligible).

AFICT, putting vault_root into accounts comes without any downsides - so, I think we should do it.

For storage header, I agree that using serialization format defined in miden-base introduces some risk, but I think a simpler way to handle it is to write custom serialization/deserialization for AccountStorageHeader here. This would give us the same amount of control, but would reduce complexity and improve performance.

I don't think we need to write this custom serialization in this PR. I'd do the following:

  • Replace the table with storage_header fields that would be a simple serialization of AccountStorageHeader.
  • Create a follow-up issue to write the custom serialization/deserialization logic for AccountStorageHeader.

Comment on lines +352 to +353
/// Represents a list of assets, if the number of assets is reasonably small, which
/// is currently set to 1000 for no particular reason.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, every asset is 32 bytes - though, we are likely to increase this to 64 bytes in the future. So, 1K assets would be about 64KB of data. It feels a bit high - so, maybe we should reduce this, but I don't have good rationale to what exactly.

Comment on lines +402 to +406
let too_many_assets = entries.len() > Self::MAX_RETURN_ENTRIES;

if too_many_assets {
return Ok(Self::too_many());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but we should probably have a way not to read all assets from the database in case the number of assets is too big. So, as a future optimization, we could also store num_assets in the accounts table (next to vault_root) - though, it will require updating this filed as the number of assets changes, which will not be trivial.

Copy link
Contributor Author

@drahnr drahnr Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier to add a subquery with LIMIT (n+1) clause where n is our actual desired limit to mitigate large~ish memory overhead.

Comment on lines 470 to 481
/// ## Future Enhancement (TODO)
///
/// Currently, when `too_many_entries = true`, we return an empty list. A future improvement
/// would return a **partial SMT** with:
/// - A subset of entries (e.g., most frequently accessed)
/// - Merkle proofs for those entries
/// - Inner node commitments
///
/// This would allow clients to verify partial data cryptographically while still
/// signaling that more data exists. The reason this matters: if all leaf values are
/// included, one can reconstruct the entire SMT; if even one is missing, one cannot.
/// By providing proofs, we enable verification of partial data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'd put this into comments - but we should create an issue for this (unless we already have one).

Overall, I don't think we'll proactively try to return a part of the map, but rather would return a partial map when the user requests values for specific keys.

Comment on lines +524 to +525
let too_many_entries = map_entries.len() > Self::MAX_RETURN_ENTRIES;
let map_entries = if too_many_entries { Vec::new() } else { map_entries };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to one of the above comments: we should probably optimize this in the future to avoid reading full maps from the database when there are too many entries - but that would be even more tricky than doing this for the asset vault.

Comment on lines +1344 to +1347
/// This reconstructs the `AccountHeader` by joining multiple tables:
/// - `accounts` table for `account_id`, `nonce`, `code_commitment`
/// - `account_vault_headers` table for `vault_root`
/// - `account_storage_headers` table for storage slot commitments (to compute `storage_commitment`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to some of the above comments: ideally, we'd be able to read the full account header just from the accounts. This should simplify queries like this - and I'm not seeing downsides to doing this.

Comment on lines +1296 to +1313
/// Queries the account code for a specific account at a specific block number.
///
/// Returns `None` if:
/// - The account doesn't exist at that block
/// - The account has no code (private account or account without code commitment)
///
/// # Arguments
///
/// * `conn` - Database connection
/// * `account_id` - The account ID to query
/// * `block_num` - The block number at which to query the account code
///
/// # Returns
///
/// * `Ok(Some(Vec<u8>))` - The account code bytes if found
/// * `Ok(None)` - If account doesn't exist or has no code
/// * `Err(DatabaseError)` - If there's a database error
pub(crate) fn select_account_code_at_block(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be awesome to add SQL statements describing these queries (as we do for some similar methods).

Ok(result)
}

/// Queries the account header for a specific account at a specific block number.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this description is correct. I think what we are getting here is the first account header with block number which is smaller than or equal to the block_num parameter - right?

Comment on lines +1257 to +1263
fn compute_storage_commitment(slot_commitments: &[Word]) -> Word {
use miden_objects::crypto::hash::rpo::Rpo256;

let elements: Vec<Felt> = slot_commitments.iter().flat_map(|w| w.iter()).copied().collect();

Rpo256::hash_elements(&elements)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to figure out how to use AccountStorageHeader here - otherwise this will be quite brittle.

For example, we could store the result of AccountStorageHeader::to_vec() in accounts.storage_header, and then, we could deserialize the bytes into AccountStorageHeader and get the commitment from it directly.

Comment on lines +1222 to +1230
let raw: Vec<(Vec<u8>, Option<Vec<u8>>)> = SelectDsl::select(
t::table
.filter(t::account_id.eq(&account_id_bytes))
.filter(t::block_num.le(block_num_sql))
.order(t::block_num.desc())
.limit(1),
(t::vault_key, t::asset),
)
.load(conn)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not always return just 1 entry (because of LIMIT 1)? I think we may need to get all entries with block_num smaller than or equal to the passed in block number and then perform filtering manually - but maybe there is a better way to do it.

Comment on lines +660 to +665
// Query latest storage headers for this account
let headers: Vec<AccountStorageHeaderRaw> =
SelectDsl::select(t::table, AccountStorageHeaderRaw::as_select())
.filter(t::account_id.eq(&account_id_bytes).and(t::is_latest.eq(true)))
.order(t::slot_index.asc())
.load(conn)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the previous comments, I would probably read the storage header directly from the accounts table.

Comment on lines +519 to +533
// Query all entries for this slot at or before the given block
let raw: Vec<(Vec<u8>, Vec<u8>)> = SelectDsl::select(t::table, (t::key, t::value))
.filter(
t::account_id
.eq(&account_id_bytes)
.and(t::slot.eq(slot_sql))
.and(t::block_num.le(block_num_sql)),
)
.load(conn)?;

// Parse entries
let entries: Vec<(Word, Word)> = raw
.into_iter()
.map(|(k, v)| Ok((Word::read_from_bytes(&k)?, Word::read_from_bytes(&v)?)))
.collect::<Result<Vec<_>, DatabaseError>>()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this result in potentially multiple entries being read from the database? This by itself is fine (or at least I don't see a way to avoid it), but I think we do need to filter the entries here to make sure we don't have more than one entry per block (i.e., we should select the last entry for each key).

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some comments above, but the main things are:

  • It is not clear to me if there is much value in having account_storage_headers and account_vault_headers tables. I think we can store the headers directly in the accounts table which should simplify the code and make querying more efficient.
  • I think some of the queries may not be working correctly - specifically the ones that try to reconstruct account vault or storage map at a given block. The main reason here is that I think we need to read more data from the DB and then filter it in memory to remove potential duplicates. But it is also possible that I missed something.

Comment on lines +629 to +633
async fn update_vaults_in_forest(
&self,
changed_account_ids: &[AccountId],
block_num: BlockNumber,
) -> Result<(), ApplyBlockError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move the InnerForest struct into a separate file and then also attach these methods to that struct? For example, something like:

impl InnerForest {
    pub async fn update_vaults(
        &self,
        db: Arc<Db>,
        changed_account_ids: &[AccountId],
        block_num: BlockNumber,
    ) -> Result<..> {
        ...
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants