BDD Endpoint Codegen #4433

landonxjames · 2025-12-03T04:34:40Z

Note: before this PR is actually merged it will need to have the S3 model reverted to the published one. Currently it has a custom S3 model with an endpointBddTrait instead just for CI testing/diff viewing purposes.

Motivation and Context

Currently all AWS SDKs use a rules engine to resolve service endpoints before every operation. The current Endpoints 2.0 decision tree design creates significant problems that directly impact customers:

Larger SDK artifacts: S3's ruleset is 427 KB of JSON. The Rust SDK's generated resolver is 452 KB and nearly 6,000 lines of code. Customers download larger SDK packages and experience slower compilation times. Mobile and embedded applications face stricter size constraints that make AWS SDK adoption difficult.

Worse operation performance: Punishing fallback logic in decision tree evaluation requires 68 redundant condition checks for a simple S3 bucket+region endpoint. Customers experience slower operation initialization, particularly in serverless environments where cold starts are critical.

Description

This PR implements code generation for the new endpointBdd trait, which uses Binary Decision Diagrams (BDDs) to optimize endpoint resolution. BDDs provide a more compact representation of endpoint rules compared to the existing endpointRuleSet trait, reducing binary size and improving resolution performance. More information about the new trait can be found on the Smithy website: https://smithy.io/2.0/additional-specs/rules-engine/specification.html#smithy-rules-endpointbdd-trait

The entrypoint to the code generation is in EndpointsDecorator.kt where, if an endpointBdd trait exists, it is prioritized over the endpointRuleSet trait. This invokes the EndpointBddGenerator which orchestrates most of the code generation. The functions in the match arms of each of the condition blocks are generated by the BddExpressionGenerator class. The most recent diff of this generated code is available here.

The most notable new Rust code lives in the rust-runtime/inlineable/ crate:

endpoint_lib/bdd_interpreter.rs contains the evaluate_bdd function used to actually traverse the BDD DAG and extract a Result.
endpoint_lib/coalesce.rs exports a new coalesce! macro that implements the new endpoint std lib coalesce function. Note that the macro here uses a technique called autoderef-based specialization and was inspired by this blog post: https://lukaskalbertodt.github.io/2019/12/05/generalized-autoref-based-specialization.html

Diving deeper into the code in the diff there are a few important new types to discuss:

struct BddNode {
    pub condition_index: i32,
    pub high_ref: i32,
    pub low_ref: i32,
}

enum ConditionFn {
    Cond0,
    Cond1,
    Cond2,
    ...
}

enum ResultEndpoint {
    Result0,
    Result1,
    Result2,
    ...
}

struct ConditionContext {
    partition_result: Option<crate::endpoint_lib::partition::Partition>,
    url: Option<crate::endpoint_lib::parse_url::Url>,
    region_prefix: Option<String>,
    ...
}

struct BddNode containing the condition_index, high_ref, and low_ref.
enum ConditionFn to represent the functions for evaluating each condition.
enum ResultEndpoint to represent the functions for generating each result.
struct ConditionContext to represent variables that can be assigned by Conditions

For each of BddNode, ConditionFn, and ResultEndpoint there is a constant sized array containing instances of each so they can be referenced by index:

const NODES: [BddNode; 807] = [
    BddNode {
        condition_index: -1,
        high_ref: 1,
        low_ref: -1,
    },
    BddNode {
        condition_index: 0,
        high_ref: 3,
        low_ref: 100000157,
    },
    BddNode {
        condition_index: 1,
        high_ref: 646,
        low_ref: 4,
    },
    ...
}

const CONDITIONS: [ConditionFn; 91] = [
    ConditionFn::Cond0,
    ConditionFn::Cond1,
    ConditionFn::Cond2,
    ...
}

const RESULTS: [ResultEndpoint; 158] = [
    ResultEndpoint::Result0,
    ResultEndpoint::Result1,
    ResultEndpoint::Result2,
    ...
}

ConditionFn implements a single function, evaluate, that takes in the modeled endpoint Params and the ConditionContext for passing around state between different condition evaluations, and a diagnostic_collector for tracking data about the evaluation.

impl ConditionFn {
    fn evaluate(
        &self,
        params: &Params,
        context: &mut ConditionContext,
        diagnostic_collector: &mut DiagnosticCollector,
    ) -> bool {
        // Param bindings
        let bucket = &params.bucket;
        let region = &params.region;
        ...

        // Non-Param references
        let partition_result = &mut context.partition_result;
        let url = &mut context.url;
        ...
        
        match self {
            Self::Cond0 => region.is_some(),
            Self::Cond1 => (accelerate) == (&true),
            Self::Cond2 => (use_fips) == (&true),
            Self::Cond3 => endpoint.is_some(),
            Self::Cond4 => bucket.is_some(),
            Self::Cond5 => {
                parse_arn(if let Some(param) = bucket { 
                    param 
                } else { 
                    return false 
                }, 
                diagnostic_collector)
                .is_some()
            }
            Self::Cond6 => {
                ("arn:")
                    == (coalesce!(
                        substring(
                            if let Some(param) = bucket { param } else { return false },
                            0,
                            4,
                            false,
                            diagnostic_collector
                        ),
                        "",
                    ))
            }
            ...
        }

ResultEndpoint also implements a single function to_endpoint that either returns an Endpoint or a ResolveEndpointError depending on which result index is passed. Note that index 0 corresponds to the NoMatchRule

impl ResultEndpoint {
    fn to_endpoint(
        &self,
        params: &Params,
        context: &ConditionContext,
    ) -> Result<Endpoint, ResolveEndpointError> {
        // Param bindings
        let bucket = params.bucket.as_ref().map(|s| s.clone()).unwrap_or_default();
        let region = params.region.as_ref().map(|s| s.clone()).unwrap_or_default();
        ...

        // Non-Param references
        let partition_result = context.partition_result.as_ref().map(|s| s.clone()).unwrap_or_default();
        let url = context.url.as_ref().map(|s| s.clone()).unwrap_or_default();
        ...

        match self {
            Self::Result0 => Result::Err(ResolveEndpointError::message("No endpoint rule matched")),
            Self::Result1 => Result::Err(ResolveEndpointError::message(
                "Accelerate cannot be used with FIPS".to_string(),
            )),
            ...
            Self::Result8 => Result::Ok(
                Endpoint::builder()
                    .url({
                        let mut out = String::new();
                        out.push_str(&url.scheme());
                        out.push_str("://");
                        out.push_str(&url.authority());
                        out.push('/');
                        out.push_str(&uri_encoded_bucket);
                        out.push_str(&url.path());
                        out
                    })
                    .property("backend", "S3Express".to_string())
                    .property(
                        "authSchemes",
                        vec![aws_smithy_types::Document::from({
                            let mut out = HashMap::<String, aws_smithy_types::Document>::new();
                            out.insert("disableDoubleEncoding".to_string(), true.into());
                            out.insert("name".to_string(), "sigv4".to_string().into());
                            out.insert("signingName".to_string(), "s3express".to_string().into());
                            out.insert("signingRegion".to_string(), region.to_owned().into());
                            out
                        })],
                    )
                    .build(),
            ),
            ...
        }
    }

The final important portion of our resolver is the evaluate_bdd function. This function is handwritten and will work with the generated nodes, conditions, and results for each service. This function takes 4 generic types: Cond for the generated conditions, Params for the modeled endpoint params, Res for the generated results, and Context for the shared condition context.

/// Evaluates a BDD to resolve an endpoint result
///
/// Arguments
/// * `nodes` - Array of BDD nodes
/// * `conditions` - Array of conditions referenced by nodes
/// * `results` - Array of possible results
/// * `root_ref` - Root reference to start evaluation
/// * `params` - Parameters for condition evaluation
/// * `context` - Values that can be set/mutated by the conditions
/// * `diagnostic_collector` - a struct for collecting information about the execution of conditions
/// * `condition_evaluator` - Function to evaluate individual conditions with params and context
///
/// Returns
/// * `Some(Res)` - Result if evaluation succeeds
/// * `None` - No match found (terminal reached)
pub fn evaluate_bdd<Cond, Params, Res, Context>(
    nodes: &[BddNode],
    conditions: &[Cond],
    results: &[Res],
    root_ref: i32,
    params: &Params,
    context: &mut Context,
    diagnostic_collector: &mut DiagnosticCollector,
    mut condition_evaluator: impl FnMut(
        &Cond,
        &Params,
        &mut Context,
        &mut DiagnosticCollector,
    ) -> bool,
) -> Option<Res> {
    let mut current_ref = root_ref;

    loop {
        match current_ref {
            // Result references (>= 100_000_000)
            ref_val if ref_val >= 100_000_000 => {
                let result_index = (ref_val - 100_000_000);
                return results.get(result_index);
            }
            // Terminals (1 = TRUE, -1 = FALSE) NoMatchRule
            1 | -1 => return results.get(0),
            // Node references
            ref_val => {
                let is_complement = ref_val < 0;
                let node_index = (ref_val.abs() - 1);

                let node = nodes.get(node_index)?;
                let condition_index = node.condition_index;

                let condition = conditions.get(condition_index)?;
                let condition_result =
                    condition_evaluator(condition, params, context, diagnostic_collector);

                // Handle complement edges: complement inverts the branch selection
                current_ref = if is_complement ^ condition_result {
                    node.high_ref
                } else {
                    node.low_ref
                };
            }
        }
    }
}

TODO: There are a few tasks left for the BDD work to be fully complete:

Update the existing tests to include endpointBdd traits. This work in being done as part of the SEP process and will come in a subsequent PR.
Various optimization experiments. Currently the resolution is fast, but the aws_sdk_s3::config::endpoint::ResultEndpoint::to_endpoint symbol is still the largest one in my test binary. I would like to experiment with reducing the size there. This could look like splitting the match arms into separate functions, but a quick POC of that approach slightly increased the overall binary size even though it did reduce the size of the largest symbol.
Better diagnostic collection. It would be nice for the diagnostic collector to record the path through the BDD and the outcome of each Condition evaluation. This would help with debugging and understanding the performance of various cases. A standardized JSON format for this is one potential future extension of the SEP.

Testing

New minimal tests introduced for BDD, more to come as the SEP is fleshed out and I manage to compile some of the existing tests to BDD format.

Note: A few tests here are failing, I'm pretty all of those are due to differences in the S3 model I am using. I borrowed it from smithy-java and it was missing a few things (like Body members on some input/output types).

Note: More codegen tests will be introduced as the SEP is finalized and more tests are converted to BDD format.

In addition to the new tests I performed various benchmarks against the S3 test suite. The raw data is too large to include here, but across the 347 tests in the S3 model the average speedup compared to the previous tree based generation was ~55%.

Checklist

For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "client," "server," or both in the applies_to key.
For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "aws-sdk-rust" in the applies_to key.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Some/most of these will be throaway like setting "requireEndpointResolver": false in build.gradle.kts.

…trait

Still lots of missing pieces

Condidtions generate, but not completely correctly. There are some compilation errors, mostly around Options where primitive types are expected. Coalesce works and is inlined.

Still a bug in Booleq and Stringeq with refs vs owned types.

Still some lifetime issues

TODO: impl results and fix the evaluate_bdd impl

TODO generate results and tests

Still need to add the tests

failures: config::endpoint::test::test_134 config::endpoint::test::test_135 config::endpoint::test::test_139 config::endpoint::test::test_246

The test still fails to compile, but it is on the actual test code, not the generated endpoint code.

…stead of slices

github-actions · 2025-12-03T21:12:00Z