Skip to content

Conversation

@landonxjames
Copy link
Contributor

@landonxjames landonxjames commented Dec 3, 2025

Note: before this PR is actually merged it will need to have the S3 model reverted to the published one. Currently it has a custom S3 model with an endpointBddTrait instead just for CI testing/diff viewing purposes.

Motivation and Context

Currently all AWS SDKs use a rules engine to resolve service endpoints before every operation. The current Endpoints 2.0 decision tree design creates significant problems that directly impact customers:

Larger SDK artifacts: S3's ruleset is 427 KB of JSON. The Rust SDK's generated resolver is 452 KB and nearly 6,000 lines of code. Customers download larger SDK packages and experience slower compilation times. Mobile and embedded applications face stricter size constraints that make AWS SDK adoption difficult.

Worse operation performance: Punishing fallback logic in decision tree evaluation requires 68 redundant condition checks for a simple S3 bucket+region endpoint. Customers experience slower operation initialization, particularly in serverless environments where cold starts are critical.

Description

This PR implements code generation for the new endpointBdd trait, which uses Binary Decision Diagrams (BDDs) to optimize endpoint resolution. BDDs provide a more compact representation of endpoint rules compared to the existing endpointRuleSet trait, reducing binary size and improving resolution performance. More information about the new trait can be found on the Smithy website: https://smithy.io/2.0/additional-specs/rules-engine/specification.html#smithy-rules-endpointbdd-trait

The entrypoint to the code generation is in EndpointsDecorator.kt where, if an endpointBdd trait exists, it is prioritized over the endpointRuleSet trait. This invokes the EndpointBddGenerator which orchestrates most of the code generation. The functions in the match arms of each of the condition blocks are generated by the BddExpressionGenerator class. The most recent diff of this generated code is available here.

The most notable new Rust code lives in the rust-runtime/inlineable/ crate:

Diving deeper into the code in the diff there are a few important new types to discuss:

struct BddNode {
    pub condition_index: i32,
    pub high_ref: i32,
    pub low_ref: i32,
}

enum ConditionFn {
    Cond0,
    Cond1,
    Cond2,
    ...
}

enum ResultEndpoint {
    Result0,
    Result1,
    Result2,
    ...
}

struct ConditionContext {
    partition_result: Option<crate::endpoint_lib::partition::Partition>,
    url: Option<crate::endpoint_lib::parse_url::Url>,
    region_prefix: Option<String>,
    ...
}
  • struct BddNode containing the condition_index, high_ref, and low_ref.
  • enum ConditionFn to represent the functions for evaluating each condition.
  • enum ResultEndpoint to represent the functions for generating each result.
  • struct ConditionContext to represent variables that can be assigned by Conditions

For each of BddNode, ConditionFn, and ResultEndpoint there is a constant sized array containing instances of each so they can be referenced by index:

const NODES: [BddNode; 807] = [
    BddNode {
        condition_index: -1,
        high_ref: 1,
        low_ref: -1,
    },
    BddNode {
        condition_index: 0,
        high_ref: 3,
        low_ref: 100000157,
    },
    BddNode {
        condition_index: 1,
        high_ref: 646,
        low_ref: 4,
    },
    ...
}

const CONDITIONS: [ConditionFn; 91] = [
    ConditionFn::Cond0,
    ConditionFn::Cond1,
    ConditionFn::Cond2,
    ...
}

const RESULTS: [ResultEndpoint; 158] = [
    ResultEndpoint::Result0,
    ResultEndpoint::Result1,
    ResultEndpoint::Result2,
    ...
}

ConditionFn implements a single function, evaluate, that takes in the modeled endpoint Params and the ConditionContext for passing around state between different condition evaluations, and a diagnostic_collector for tracking data about the evaluation.

impl ConditionFn {
    fn evaluate(
        &self,
        params: &Params,
        context: &mut ConditionContext,
        diagnostic_collector: &mut DiagnosticCollector,
    ) -> bool {
        // Param bindings
        let bucket = &params.bucket;
        let region = &params.region;
        ...

        // Non-Param references
        let partition_result = &mut context.partition_result;
        let url = &mut context.url;
        ...
        
        match self {
            Self::Cond0 => region.is_some(),
            Self::Cond1 => (accelerate) == (&true),
            Self::Cond2 => (use_fips) == (&true),
            Self::Cond3 => endpoint.is_some(),
            Self::Cond4 => bucket.is_some(),
            Self::Cond5 => {
                parse_arn(if let Some(param) = bucket { 
                    param 
                } else { 
                    return false 
                }, 
                diagnostic_collector)
                .is_some()
            }
            Self::Cond6 => {
                ("arn:")
                    == (coalesce!(
                        substring(
                            if let Some(param) = bucket { param } else { return false },
                            0,
                            4,
                            false,
                            diagnostic_collector
                        ),
                        "",
                    ))
            }
            ...
        }

ResultEndpoint also implements a single function to_endpoint that either returns an Endpoint or a ResolveEndpointError depending on which result index is passed. Note that index 0 corresponds to the NoMatchRule

impl ResultEndpoint {
    fn to_endpoint(
        &self,
        params: &Params,
        context: &ConditionContext,
    ) -> Result<Endpoint, ResolveEndpointError> {
        // Param bindings
        let bucket = params.bucket.as_ref().map(|s| s.clone()).unwrap_or_default();
        let region = params.region.as_ref().map(|s| s.clone()).unwrap_or_default();
        ...

        // Non-Param references
        let partition_result = context.partition_result.as_ref().map(|s| s.clone()).unwrap_or_default();
        let url = context.url.as_ref().map(|s| s.clone()).unwrap_or_default();
        ...

        match self {
            Self::Result0 => Result::Err(ResolveEndpointError::message("No endpoint rule matched")),
            Self::Result1 => Result::Err(ResolveEndpointError::message(
                "Accelerate cannot be used with FIPS".to_string(),
            )),
            ...
            Self::Result8 => Result::Ok(
                Endpoint::builder()
                    .url({
                        let mut out = String::new();
                        out.push_str(&url.scheme());
                        out.push_str("://");
                        out.push_str(&url.authority());
                        out.push('/');
                        out.push_str(&uri_encoded_bucket);
                        out.push_str(&url.path());
                        out
                    })
                    .property("backend", "S3Express".to_string())
                    .property(
                        "authSchemes",
                        vec![aws_smithy_types::Document::from({
                            let mut out = HashMap::<String, aws_smithy_types::Document>::new();
                            out.insert("disableDoubleEncoding".to_string(), true.into());
                            out.insert("name".to_string(), "sigv4".to_string().into());
                            out.insert("signingName".to_string(), "s3express".to_string().into());
                            out.insert("signingRegion".to_string(), region.to_owned().into());
                            out
                        })],
                    )
                    .build(),
            ),
            ...
        }
    }

The final important portion of our resolver is the evaluate_bdd function. This function is handwritten and will work with the generated nodes, conditions, and results for each service. This function takes 4 generic types: Cond for the generated conditions, Params for the modeled endpoint params, Res for the generated results, and Context for the shared condition context.

/// Evaluates a BDD to resolve an endpoint result
///
/// Arguments
/// * `nodes` - Array of BDD nodes
/// * `conditions` - Array of conditions referenced by nodes
/// * `results` - Array of possible results
/// * `root_ref` - Root reference to start evaluation
/// * `params` - Parameters for condition evaluation
/// * `context` - Values that can be set/mutated by the conditions
/// * `diagnostic_collector` - a struct for collecting information about the execution of conditions
/// * `condition_evaluator` - Function to evaluate individual conditions with params and context
///
/// Returns
/// * `Some(Res)` - Result if evaluation succeeds
/// * `None` - No match found (terminal reached)
pub fn evaluate_bdd<Cond, Params, Res, Context>(
    nodes: &[BddNode],
    conditions: &[Cond],
    results: &[Res],
    root_ref: i32,
    params: &Params,
    context: &mut Context,
    diagnostic_collector: &mut DiagnosticCollector,
    mut condition_evaluator: impl FnMut(
        &Cond,
        &Params,
        &mut Context,
        &mut DiagnosticCollector,
    ) -> bool,
) -> Option<Res> {
    let mut current_ref = root_ref;

    loop {
        match current_ref {
            // Result references (>= 100_000_000)
            ref_val if ref_val >= 100_000_000 => {
                let result_index = (ref_val - 100_000_000);
                return results.get(result_index);
            }
            // Terminals (1 = TRUE, -1 = FALSE) NoMatchRule
            1 | -1 => return results.get(0),
            // Node references
            ref_val => {
                let is_complement = ref_val < 0;
                let node_index = (ref_val.abs() - 1);

                let node = nodes.get(node_index)?;
                let condition_index = node.condition_index;

                let condition = conditions.get(condition_index)?;
                let condition_result =
                    condition_evaluator(condition, params, context, diagnostic_collector);

                // Handle complement edges: complement inverts the branch selection
                current_ref = if is_complement ^ condition_result {
                    node.high_ref
                } else {
                    node.low_ref
                };
            }
        }
    }
}

TODO: There are a few tasks left for the BDD work to be fully complete:

  • Update the existing tests to include endpointBdd traits. This work in being done as part of the SEP process and will come in a subsequent PR.
  • Various optimization experiments. Currently the resolution is fast, but the aws_sdk_s3::config::endpoint::ResultEndpoint::to_endpoint symbol is still the largest one in my test binary. I would like to experiment with reducing the size there. This could look like splitting the match arms into separate functions, but a quick POC of that approach slightly increased the overall binary size even though it did reduce the size of the largest symbol.
  • Better diagnostic collection. It would be nice for the diagnostic collector to record the path through the BDD and the outcome of each Condition evaluation. This would help with debugging and understanding the performance of various cases. A standardized JSON format for this is one potential future extension of the SEP.

Testing

New minimal tests introduced for BDD, more to come as the SEP is fleshed out and I manage to compile some of the existing tests to BDD format.

Note: A few tests here are failing, I'm pretty all of those are due to differences in the S3 model I am using. I borrowed it from smithy-java and it was missing a few things (like Body members on some input/output types).

Note: More codegen tests will be introduced as the SEP is finalized and more tests are converted to BDD format.

In addition to the new tests I performed various benchmarks against the S3 test suite. The raw data is too large to include here, but across the 347 tests in the S3 model the average speedup compared to the previous tree based generation was ~55%.

Checklist

  • For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "client," "server," or both in the applies_to key.
  • For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "aws-sdk-rust" in the applies_to key.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Some/most of these will be throaway like setting "requireEndpointResolver": false
in build.gradle.kts.
Still lots of missing pieces
Condidtions generate, but not completely correctly. There are some compilation
errors, mostly around Options where primitive types are expected.
Coalesce works and is inlined.
Still a bug in Booleq and Stringeq with refs vs owned types.
Still some lifetime issues
TODO: impl results and fix the evaluate_bdd impl
TODO generate results and tests
Still need to add the tests
failures:
    config::endpoint::test::test_134
    config::endpoint::test::test_135
    config::endpoint::test::test_139
    config::endpoint::test::test_246
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

It was missing for some reason. I stole the model from smithy-java so
they might have done some processing of it.

Add StreamingBlob shape to S3 model
@github-actions
Copy link

github-actions bot commented Dec 5, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! Nice work.

Mostly minor comments outside of existing TODO's and things we've already discussed re:testing.

return rules.getBuiltIn(builtIn)
val endpointBddTrait = idx.getEndpointBddTrait(serviceShape)
val rules = idx.endpointRulesForService(serviceShape)
if (endpointBddTrait != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/style: you can avoid multiple returns IIRC

return if(...) {
    <value>
} else if (...){
    <value>
} else {
     <value>
}

* are used to generate `Result`s. This requires some cloning and unwrapping. The
* unwraps are all `unwrap_or_default`. This is safe since it will not panic, and
* thanks to the guarantees of BDD compilation we know that if a value is used to
* construct a Result it has a value, so default values will never be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always expect a value why not just unwrap() or expect() then

Copy link
Contributor Author

@landonxjames landonxjames Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should clarify this comment. The gist of it is we always expect aSome(_) from the variables that are used in constructing the result that is decided on at runtime. Variables that are not part of the actual result might be Some(_) or None. To simplify the codegen all of the variables are unwrapped and ready to use. The default values help with the cases of the variables that are None, so they don't fail on .unwrap().

I was considering attempting to move the variable unwrapping into the individual match arms to avoid doing this as a batch, but I haven't had a chance to experiment with that yet. I also worry that this might increase the code size quite a bit, since we will have multiple locations where each value would need to be unwrapped. But this is already a lot of code so it might not matter too much.

else -> rust("${ref.name.rustName()}.to_owned()")
}
} catch (_: RuntimeException) {
// Typechecking was never invoked, default to .to_owned()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we expect this exception? simply calling ref.type()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this was very counter-intuitive for me, but it can fail if the value has not been previously typeChecked: https://smithy.io/javadoc/1.62.0/software/amazon/smithy/rulesengine/language/syntax/expressions/Expression.html#type()

fun testGenerator(): Writable =
defaultResolver()?.let {
// BDD takes priority just like in EndpointDecorator
defaultResolverBdd()?.let {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of branching logic around BDD vs fallback to endpoint rules. I'm assuming the plan is eventually we'll phase out endpoint rules and just have BDD, in which case this seems ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, when BDDs are fully published all of this branching logic can be pulled out and we would only use the BDD. I would like to keep it in at least for a few weeks after BDD models launch just so we have an easy way to flip back to the previous solution if we encounter a show stopping bug. I can add in TODOs around removing these once we are Trebuchet fully onboards all models to BDD traits.

loop {
match current_ref {
// Result references (>= 100_000_000)
ref_val if ref_val >= 100_000_000 => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe define constants for these magic numbers and terminals, etc.

/// * `Some(R)` - Result if evaluation succeeds
/// * `None` - No match found
#[allow(clippy::too_many_arguments)]
pub(crate) fn evaluate_bdd<'a, Cond, Params, Res: Clone, Context>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we relying on generated endpoint tests for testing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I could probably hand write some small BDD examples for unit testing the result/terminal behavior, but I think in general it will be hard to have handwritten BDD tests.

/*
* Implementation Note: Autoderef Specialization
*
* This implementation uses autoderef specialization to dispatch to different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with autoderef specialization, maybe some links to resources about it for posterity. Also it doesn't require nightly or anything right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reference I am aware of is this blog post (which is included in the PR description, but not the comment) https://lukaskalbertodt.github.io/2019/12/05/generalized-autoref-based-specialization.html I'll add it to the comment.

Also it doesn't require nightly or anything right?

It does not. Actual language support for non-deref based specialization does require nightly, so I avoided using that https://doc.rust-lang.org/beta/unstable-book/language-features/specialization.html

raw: &'a str,
}

impl Default for Url<'_> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure default makes sense here either. Presumably we are abusing unwrap_or_default() or something for this? Do we not have other options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we are abusing unwrap_or_default() or something for this?

This is exactly the reason. Agree that it is kind of a gross default (same with ARN), but these are not pub types so I think we can potentially get away with it. Addressed how this is used and a potential alternative that would avoid it in this comment.

@landonxjames landonxjames marked this pull request as ready for review December 11, 2025 18:52
@landonxjames landonxjames requested review from a team as code owners December 11, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants