Skip to content

Conversation

@davepacheco
Copy link
Collaborator

Fixes #9706.

This is a prototype.

@davepacheco davepacheco self-assigned this Jan 22, 2026
Copy link
Collaborator Author

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunshowers what do you think of this?

Comment on lines 518 to 536
// XXX-dap relationships we want to ignore because
// they're local-machine only
match (server_pkg.as_str(), client_pkg.as_str()) {
("ddmd", "dpd-client")
| ("dpd", "gateway-client")
| ("lldpd", "dpd-client")
| ("mgd", "ddm-admin-client")
| ("mgd", "dpd-client")
| ("mgd", "gateway-client")
| ("omicron-sled-agent", "gateway-client")
| ("omicron-sled-agent", "ddm-admin-client")
| ("omicron-sled-agent", "dpd-client") // XXX-dap
| ("omicron-sled-agent", "mg-admin-client")
| ("omicron-sled-agent", "propolis-client")
| ("tfportd", "dpd-client")
| ("tfportd", "lldpd-client")
| ("wicketd", _) => continue,
_ => (),
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the prototype-y bit. This match is saying: ignore dependencies from the component on the left to the API on the right. I basically added all of the intra-host-OS dependencies here for testing. We'll want to vet each one of these relationships to make sure they're really okay.

If this overall approach is promising, then this should be driven by metadata in api-manifest.toml, not a hardcoded match.

Most importantly: if you comment out L529 (the one from omicron-sled-agent to dpd-client), cargo xtask openapi check reports the error we want:

$ cargo xtask ls-apis check
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.44s
     Running `target/debug/xtask ls-apis check`
   Compiling omicron-ls-apis v0.1.0 (/home/dap/omicron-clean/dev-tools/ls-apis)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.34s
     Running `target/debug/ls-apis check`
loading metadata for workspace omicron from current workspace
loading metadata for workspace propolis from /home/dap/.cargo/git/checkouts/propolis-d68c8bd1bc59c9bd/2aa7f9d/Cargo.toml
loading metadata for workspace lldp from /home/dap/.cargo/git/checkouts/lldp-d47de417041f191b/61479b6/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-ae9f1715c17fc765/606c0be/Cargo.toml
loading metadata for workspace crucible from /home/dap/.cargo/git/checkouts/crucible-0a48bd218bc2bbbc/7103cd3/Cargo.toml
loading metadata for workspace maghemite from /home/dap/.cargo/git/checkouts/maghemite-c0236f0fd3d582b6/396bb3c/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-ae9f1715c17fc765/ab30fa9/Cargo.toml
note: ignoring Cargo dependency from crucible-pantry -> ... -> crucible-control-client
note: ignoring Cargo dependency from omicron-sled-agent -> dns-server
Error: graph of server-managed API dependencies between deployment units has a cycle (includes node: "Host OS")

The error message isn't that great but the assumption is that you're only going to see this when you've added a new edge and it'll probably be obvious what it was. That was not the case here, though.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, what we care about here is primarily host OS services on one sled calling out to host OS services on another, right?

I think this direction makes sense, but I also feel like we need some kind of static analysis to differentiate between localhost and non-localhost clients. Maybe something like another client layer on top of the existing one which only allows localhost clients to be created.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think that would be fantastic, but hard in practice because as I recall, the software involved actually does take an IP/port to use for dependencies as command-line arguments that come up in via SMF, which in turn come from sled agent. So the correctness depends at runtime on the fact that sled agent always points them at the same system.

I could also imagine wanting to be able to develop these components and point them at components on other systems.

But if there were a way to make this work, I agree that'd be a useful step.

| ("mgd", "ddm-admin-client")
| ("mgd", "dpd-client")
| ("mgd", "gateway-client")
| ("omicron-sled-agent", "gateway-client")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ug. Every sled-agent also talks to both MGS instances. I'm not sure it strictly needs to, but would need to talk to the networking folks more to understand if we can take it out. This happens here:

info!(
self.log, "Querying MGS to determine switch location";
"addr" => %addr,
);
let switch_slot = mgs_client
.sp_local_switch_id()
.await
.with_context(|| format!("Failed to query MGS at {addr}"))?
.into_inner()
.slot;

The callers are:

  • RSS, so it can pass a HashMap<SwitchLocation, Ipv6Addr> to Nexus (seems like in principle Nexus could figure this out on its own?)
  • OPTE port creation, so it match up "which switch(es) has/have configured uplinks" with "what's the IP of the dendrite on that switch" (maybe we could always try to look up all dendrite instances regardless of configured uplinks, which means we wouldn't need to talk to MGS?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ls-apis missed the invalid dependency between sled agent and dpd

4 participants