Add database models and queries for External Subnets #9679

bnaecker · 2026-01-18T21:21:40Z

Adds model types for Subnet Pools, Subnet Pool Members, links between Subnet Pools and Silos, and External Subnets.
Add queries for CRUD on all four, which enforce our invariants like no overlapping IP members or subnets, and do allocation for External Subnets from an explicitly-requested subnet, or automatically from a pool. Pools can be specified either by name / ID or an optional IP version. In the later cases, the Subnet Pool is chosen from the pools linked to the silo.
Adds some scaffolding useful for coming integration with the public API, such as authz types and checks and create-time parameter types.
Closes Database models and queries for external subnets #9578

- Adds model types for Subnet Pools, Subnet Pool Members, links between Subnet Pools and Silos, and External Subnets. - Add queries for CRUD on all four, which enforce our invariants like no overlapping IP members or subnets, and do allocation for External Subnets from an explicitly-requested subnet, or automatically from a pool. Pools can be specified either by name / ID or an optional IP version. In the later cases, the Subnet Pool is chosen from the pools linked to the silo. - Adds some scaffolding useful for coming integration with the public API, such as authz types and checks and create-time parameter types. - Closes #9578

values where the prefix is different.

bnaecker · 2026-01-20T16:49:36Z

Excepting a CI flake, this is ready for review. The bad news is this PR is pretty big. The good is that it's mostly tests. The SQL involved is pretty subtle, so I went out of the way test it as much as I could.

nexus/auth/src/authz/api_resources.rs

mergeconflict · 2026-01-20T17:18:04Z

nexus/auth/src/authz/api_resources.rs

+impl oso::PolarClass for SubnetPoolList {
+    fn get_polar_class_builder() -> oso::ClassBuilder<Self> {
+        oso::Class::builder()
+            .with_equality_check()
+            .add_attribute_getter("fleet", |_: &SubnetPoolList| FLEET)
+    }
+}


You don't need to add a comment here or anything, since this looks like it's probably really straightforward if you know some basics, but could you give me (or point me at) a quick tutorial on Oso sometime?

Dave Pacheco would be the best person for detailed questions, but we have a debugging guide here and some general background in the block comment starting here.

nexus/db-lookup/src/lookup.rs

nexus/db-model/src/external_subnet.rs

mergeconflict · 2026-01-20T17:58:49Z

schema/crdb/add-external-subnets/up01.sql

I'm guessing these are some sort of schema update fragments? Is it like each DDL statement has to be in its own file?

Yeah, that's right. We have some docs here, but the gist is that you write each individual step in the update in its own file, and we run them under in a transaction during an upgrade. We have a test that ensures running all the updates results in the same schema as the overall dbinit.sql file.

mergeconflict · 2026-01-20T18:53:38Z

nexus/db-queries/src/db/datastore/external_subnet.rs

+        let Error::InvalidRequest { message } = &err else {
+            panic!("expected a NotFound error, found {err:#?}");
+        };


Can you use assert_matches! instead of panic here?

Yeah definitely. I like this formulation since it destructures the returned type too, but either would work.

mergeconflict · 2026-01-20T19:03:35Z

nexus/db-queries/src/db/datastore/external_subnet.rs

+        let _ = datastore
+            .unlink_subnet_pool_from_silo(
+                opctx,
+                &authz_pool,
+                &pool,
+                &authz_silo,
+            )
+            .await
+            .expect("able to unlink pool from silo");
+        let _ = datastore
+            .link_subnet_pool_to_silo(opctx, &new_authz_pool, &authz_silo, true)
+            .await
+            .expect("able to link pool to silo after unlinking other");


Rust style question: would it be (marginally) better to have tests like this return Result, so we could just write:

datastore.unlink_subnet_pool_from_silo(...).await?;

Or is it preferable to do the let _ = [...].expect() for a more explicit test failure message?

We generally do the latter, to get more contextual information about the failure and why we didn't expect it. But you can definitely write tests that return a Result directly. I believe the test harness will unwrap it in that case, but I'm not sure.

mergeconflict · 2026-01-20T19:12:00Z

nexus/db-queries/src/db/queries/external_subnet.rs

Is there a way to make sure we're using the expected indexes for these queries? Can we EXPLAIN ANALYZE or something with CRDB?

We do that sometimes, yes. We have a trait for explaining a query and getting the results. The main way we enforce using indexes is that all queries are forbidden to result in a plan with a full table or index scan. (The exception here is that you can disable this for data migrations during a schema upgrade.) This doesn't depend on the size of the table. No full scans at any point, of anything.

I use that ExplainableAsync trait in this PR, in the can_explain_insert_external_subnet_from_explicit_ip_query test.

mergeconflict · 2026-01-20T19:13:28Z

nexus/db-queries/src/db/queries/external_subnet.rs

+/// This query is pretty complicated. First, we support creating a subnet in a
+/// few different ways, such as by an explicit IP subnet, automatically from a
+/// specific pool, or automatically from a pool linked to the current Silo.


Do we need to express all cases in one query, or is possible to split them up?

I guess it depends on what you mean, but there really are four separate queries generated by this function:

requesting by explicit subnet

requesting by explicit pool (name or ID)

requesting by default pool of specific version

requesting the default pool, of which there has to be only one.

If you mean, "can we split up all these CTEs into different queries", then...you could, at the cost of an explicit interactive transaction. We try to avoid that, but it's obviously a tradeoff. This query is big and complicated, and the contention we might run into could be worth a simpler query. I personally try very hard to avoid explicit transactions, but I'm probably overly-sensitive to that.

nexus/db-queries/src/db/queries/external_subnet.rs

FelixMcFelix

Thanks Ben, I have a few higher-level questions but otherwise this looks like it's handling some very tricky problems correctly. Thanks a lot for the robust testing here, particularly around gap sizing and allocation.

nexus/db-model/src/external_subnet.rs

FelixMcFelix · 2026-01-21T13:21:59Z

nexus/db-queries/src/db/datastore/external_subnet.rs

+    /// Create a new Subnet Pool Member.
+    ///
+    /// IP subnets must be unique across all Subnet Pool Members, in all pools.
+    /// Any request to create a member with an overlapping IP subnet will fail.
+    pub async fn create_subnet_pool_member(
+        &self,
+        opctx: &OpContext,
+        authz_pool: &authz::SubnetPool,
+        params: &params::SubnetPoolMemberCreate,
+    ) -> CreateResult<SubnetPoolMember> {
+        opctx.authorize(authz::Action::CreateChild, authz_pool).await?;
+        let member = SubnetPoolMember::new(params, authz_pool.id())?;
+        insert_subnet_pool_member_query(&member)


We'll also need to prevent overlap with any IP pools, otherwise we might end up with some very strange NAT table behaviour at the switch. That ask is probably more relevant to the mechanics of insert_subnet_pool_member_query.

From the outside I had thought that Pool-typed AddressLots would be the central point for resource allocation between subnet pools and IP pools. But some digging pointed me at #3448, so that wouldn't work without an almighty migration that is 100% not in scope (and probably would be wasted given the rework proposed in RFD533). Practically I think this means that insert_subnet_pool_member_query and queries::ip_pool::FilterOverlappingIpRanges need to account for both of these tables?

Oof good catch. I will see about adding that to both of these queries. Depending on how bad that is, I might defer that to a follow-up PR. This one is already enormous.

FelixMcFelix · 2026-01-21T13:30:00Z

nexus/db-queries/src/db/datastore/external_subnet.rs

+        authz_project: &authz::Project,
+        params: params::ExternalSubnetCreate,
+    ) -> CreateResult<ExternalSubnet> {
+        opctx.authorize(authz::Action::CreateChild, authz_project).await?;


Should we also require CreateChild permissions on the pool that the subnet is drawn from, in keeping with the EIP model?

That would be nice, but I don't think we can directly do that. We don't know which pool is being drawn from until we run the query, at least in the current formulation. The only case I'm aware of doing that in the EIP case is when allocating a service IP. Today at least, we know all the pools that could come from and so we can do that before running the allocation query.

I see, thanks for the explanation there.

nexus/db-queries/src/db/datastore/external_subnet.rs

nexus/types/src/external_api/params.rs

schema/crdb/dbinit.sql

nexus/types/src/external_api/params.rs

nexus/db-queries/src/db/queries/external_subnet.rs

- Fix a couple of index and schema update fragment bugs - Comments on some subtle / confusing database details - Handle overflow at the very end of IP address space - Remove `silo_id` from the `external_subnet` table. - Various typo fixes

bnaecker · 2026-01-22T06:42:22Z

Thanks @FelixMcFelix and @mergeconflict for the thoughtful reviews. I've addressed almost all the comments. There are two outstanding items.

Check for overlap between IP Pools and Subnet Pools when inserting either. Add a test on both sides.
Make a new Nexus API version for the changed ExternalSubnetAllocator variants that disallow both a subnet and a pool, since the former fully-constrains the latter.

Sorry to say both of these will add even more the PR. At least its mostly generated code? 😬

bnaecker and others added 3 commits January 18, 2026 13:20

Fix IAM role test, cleanup on successful external subnet tests

3ce5d50

Add more tests, work around some...looseness in CRDB's sorting of INET

eafdccd

values where the prefix is different.

bnaecker requested review from FelixMcFelix, mergeconflict and rcgoodfellow January 20, 2026 16:48

mergeconflict reviewed Jan 20, 2026

View reviewed changes

mergeconflict mentioned this pull request Jan 21, 2026

Allow implementations of AuthorizedResource::load_roles to use async fn syntax. #9693

Open

FelixMcFelix reviewed Jan 21, 2026

View reviewed changes

bnaecker added 3 commits January 21, 2026 15:50

Merge branch 'main' into external-subnet-database-crud

71c372a

Fixup after merge with main

a3fb659

Review feedback

0564f98

- Fix a couple of index and schema update fragment bugs - Comments on some subtle / confusing database details - Handle overflow at the very end of IP address space - Remove `silo_id` from the `external_subnet` table. - Various typo fixes

bnaecker added 2 commits January 22, 2026 08:45

Merge branch 'main' into external-subnet-database-crud

5a1364a

Update OpenAPI spec for safer subnet allocation type

93f672c

Add database models and queries for External Subnets #9679

Are you sure you want to change the base?

Add database models and queries for External Subnets #9679

Uh oh!

Conversation

bnaecker commented Jan 18, 2026

Uh oh!

bnaecker commented Jan 20, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FelixMcFelix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bnaecker commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants