-
Notifications
You must be signed in to change notification settings - Fork 67
Add database models and queries for External Subnets #9679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bnaecker
commented
Jan 18, 2026
- Adds model types for Subnet Pools, Subnet Pool Members, links between Subnet Pools and Silos, and External Subnets.
- Add queries for CRUD on all four, which enforce our invariants like no overlapping IP members or subnets, and do allocation for External Subnets from an explicitly-requested subnet, or automatically from a pool. Pools can be specified either by name / ID or an optional IP version. In the later cases, the Subnet Pool is chosen from the pools linked to the silo.
- Adds some scaffolding useful for coming integration with the public API, such as authz types and checks and create-time parameter types.
- Closes Database models and queries for external subnets #9578
- Adds model types for Subnet Pools, Subnet Pool Members, links between Subnet Pools and Silos, and External Subnets. - Add queries for CRUD on all four, which enforce our invariants like no overlapping IP members or subnets, and do allocation for External Subnets from an explicitly-requested subnet, or automatically from a pool. Pools can be specified either by name / ID or an optional IP version. In the later cases, the Subnet Pool is chosen from the pools linked to the silo. - Adds some scaffolding useful for coming integration with the public API, such as authz types and checks and create-time parameter types. - Closes #9578
values where the prefix is different.
|
Excepting a CI flake, this is ready for review. The bad news is this PR is pretty big. The good is that it's mostly tests. The SQL involved is pretty subtle, so I went out of the way test it as much as I could. |
| impl oso::PolarClass for SubnetPoolList { | ||
| fn get_polar_class_builder() -> oso::ClassBuilder<Self> { | ||
| oso::Class::builder() | ||
| .with_equality_check() | ||
| .add_attribute_getter("fleet", |_: &SubnetPoolList| FLEET) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to add a comment here or anything, since this looks like it's probably really straightforward if you know some basics, but could you give me (or point me at) a quick tutorial on Oso sometime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing these are some sort of schema update fragments? Is it like each DDL statement has to be in its own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's right. We have some docs here, but the gist is that you write each individual step in the update in its own file, and we run them under in a transaction during an upgrade. We have a test that ensures running all the updates results in the same schema as the overall dbinit.sql file.
| let Error::InvalidRequest { message } = &err else { | ||
| panic!("expected a NotFound error, found {err:#?}"); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use assert_matches! instead of panic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah definitely. I like this formulation since it destructures the returned type too, but either would work.
| let _ = datastore | ||
| .unlink_subnet_pool_from_silo( | ||
| opctx, | ||
| &authz_pool, | ||
| &pool, | ||
| &authz_silo, | ||
| ) | ||
| .await | ||
| .expect("able to unlink pool from silo"); | ||
| let _ = datastore | ||
| .link_subnet_pool_to_silo(opctx, &new_authz_pool, &authz_silo, true) | ||
| .await | ||
| .expect("able to link pool to silo after unlinking other"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust style question: would it be (marginally) better to have tests like this return Result, so we could just write:
datastore.unlink_subnet_pool_from_silo(...).await?;Or is it preferable to do the let _ = [...].expect() for a more explicit test failure message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally do the latter, to get more contextual information about the failure and why we didn't expect it. But you can definitely write tests that return a Result directly. I believe the test harness will unwrap it in that case, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make sure we're using the expected indexes for these queries? Can we EXPLAIN ANALYZE or something with CRDB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do that sometimes, yes. We have a trait for explaining a query and getting the results. The main way we enforce using indexes is that all queries are forbidden to result in a plan with a full table or index scan. (The exception here is that you can disable this for data migrations during a schema upgrade.) This doesn't depend on the size of the table. No full scans at any point, of anything.
I use that ExplainableAsync trait in this PR, in the can_explain_insert_external_subnet_from_explicit_ip_query test.
| /// This query is pretty complicated. First, we support creating a subnet in a | ||
| /// few different ways, such as by an explicit IP subnet, automatically from a | ||
| /// specific pool, or automatically from a pool linked to the current Silo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to express all cases in one query, or is possible to split them up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on what you mean, but there really are four separate queries generated by this function:
- requesting by explicit subnet
- requesting by explicit pool (name or ID)
- requesting by default pool of specific version
- requesting the default pool, of which there has to be only one.
If you mean, "can we split up all these CTEs into different queries", then...you could, at the cost of an explicit interactive transaction. We try to avoid that, but it's obviously a tradeoff. This query is big and complicated, and the contention we might run into could be worth a simpler query. I personally try very hard to avoid explicit transactions, but I'm probably overly-sensitive to that.
FelixMcFelix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ben, I have a few higher-level questions but otherwise this looks like it's handling some very tricky problems correctly. Thanks a lot for the robust testing here, particularly around gap sizing and allocation.
| /// Create a new Subnet Pool Member. | ||
| /// | ||
| /// IP subnets must be unique across all Subnet Pool Members, in all pools. | ||
| /// Any request to create a member with an overlapping IP subnet will fail. | ||
| pub async fn create_subnet_pool_member( | ||
| &self, | ||
| opctx: &OpContext, | ||
| authz_pool: &authz::SubnetPool, | ||
| params: ¶ms::SubnetPoolMemberCreate, | ||
| ) -> CreateResult<SubnetPoolMember> { | ||
| opctx.authorize(authz::Action::CreateChild, authz_pool).await?; | ||
| let member = SubnetPoolMember::new(params, authz_pool.id())?; | ||
| insert_subnet_pool_member_query(&member) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also need to prevent overlap with any IP pools, otherwise we might end up with some very strange NAT table behaviour at the switch. That ask is probably more relevant to the mechanics of insert_subnet_pool_member_query.
From the outside I had thought that Pool-typed AddressLots would be the central point for resource allocation between subnet pools and IP pools. But some digging pointed me at #3448, so that wouldn't work without an almighty migration that is 100% not in scope (and probably would be wasted given the rework proposed in RFD533). Practically I think this means that insert_subnet_pool_member_query and queries::ip_pool::FilterOverlappingIpRanges need to account for both of these tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof good catch. I will see about adding that to both of these queries. Depending on how bad that is, I might defer that to a follow-up PR. This one is already enormous.
| authz_project: &authz::Project, | ||
| params: params::ExternalSubnetCreate, | ||
| ) -> CreateResult<ExternalSubnet> { | ||
| opctx.authorize(authz::Action::CreateChild, authz_project).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also require CreateChild permissions on the pool that the subnet is drawn from, in keeping with the EIP model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice, but I don't think we can directly do that. We don't know which pool is being drawn from until we run the query, at least in the current formulation. The only case I'm aware of doing that in the EIP case is when allocating a service IP. Today at least, we know all the pools that could come from and so we can do that before running the allocation query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation there.
- Fix a couple of index and schema update fragment bugs - Comments on some subtle / confusing database details - Handle overflow at the very end of IP address space - Remove `silo_id` from the `external_subnet` table. - Various typo fixes
|
Thanks @FelixMcFelix and @mergeconflict for the thoughtful reviews. I've addressed almost all the comments. There are two outstanding items.
Sorry to say both of these will add even more the PR. At least its mostly generated code? 😬 |