-
Notifications
You must be signed in to change notification settings - Fork 67
ls-apis needs to detect cycles in dependency unit graph #9707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
davepacheco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunshowers what do you think of this?
dev-tools/ls-apis/src/system_apis.rs
Outdated
| // XXX-dap relationships we want to ignore because | ||
| // they're local-machine only | ||
| match (server_pkg.as_str(), client_pkg.as_str()) { | ||
| ("ddmd", "dpd-client") | ||
| | ("dpd", "gateway-client") | ||
| | ("lldpd", "dpd-client") | ||
| | ("mgd", "ddm-admin-client") | ||
| | ("mgd", "dpd-client") | ||
| | ("mgd", "gateway-client") | ||
| | ("omicron-sled-agent", "gateway-client") | ||
| | ("omicron-sled-agent", "ddm-admin-client") | ||
| | ("omicron-sled-agent", "dpd-client") // XXX-dap | ||
| | ("omicron-sled-agent", "mg-admin-client") | ||
| | ("omicron-sled-agent", "propolis-client") | ||
| | ("tfportd", "dpd-client") | ||
| | ("tfportd", "lldpd-client") | ||
| | ("wicketd", _) => continue, | ||
| _ => (), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the prototype-y bit. This match is saying: ignore dependencies from the component on the left to the API on the right. I basically added all of the intra-host-OS dependencies here for testing. We'll want to vet each one of these relationships to make sure they're really okay.
If this overall approach is promising, then this should be driven by metadata in api-manifest.toml, not a hardcoded match.
Most importantly: if you comment out L529 (the one from omicron-sled-agent to dpd-client), cargo xtask openapi check reports the error we want:
$ cargo xtask ls-apis check
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.44s
Running `target/debug/xtask ls-apis check`
Compiling omicron-ls-apis v0.1.0 (/home/dap/omicron-clean/dev-tools/ls-apis)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.34s
Running `target/debug/ls-apis check`
loading metadata for workspace omicron from current workspace
loading metadata for workspace propolis from /home/dap/.cargo/git/checkouts/propolis-d68c8bd1bc59c9bd/2aa7f9d/Cargo.toml
loading metadata for workspace lldp from /home/dap/.cargo/git/checkouts/lldp-d47de417041f191b/61479b6/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-ae9f1715c17fc765/606c0be/Cargo.toml
loading metadata for workspace crucible from /home/dap/.cargo/git/checkouts/crucible-0a48bd218bc2bbbc/7103cd3/Cargo.toml
loading metadata for workspace maghemite from /home/dap/.cargo/git/checkouts/maghemite-c0236f0fd3d582b6/396bb3c/Cargo.toml
loading metadata for workspace dendrite from /home/dap/.cargo/git/checkouts/dendrite-ae9f1715c17fc765/ab30fa9/Cargo.toml
note: ignoring Cargo dependency from crucible-pantry -> ... -> crucible-control-client
note: ignoring Cargo dependency from omicron-sled-agent -> dns-server
Error: graph of server-managed API dependencies between deployment units has a cycle (includes node: "Host OS")
The error message isn't that great but the assumption is that you're only going to see this when you've added a new edge and it'll probably be obvious what it was. That was not the case here, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, what we care about here is primarily host OS services on one sled calling out to host OS services on another, right?
I think this direction makes sense, but I also feel like we need some kind of static analysis to differentiate between localhost and non-localhost clients. Maybe something like another client layer on top of the existing one which only allows localhost clients to be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think that would be fantastic, but hard in practice because as I recall, the software involved actually does take an IP/port to use for dependencies as command-line arguments that come up in via SMF, which in turn come from sled agent. So the correctness depends at runtime on the fact that sled agent always points them at the same system.
I could also imagine wanting to be able to develop these components and point them at components on other systems.
But if there were a way to make this work, I agree that'd be a useful step.
dev-tools/ls-apis/src/system_apis.rs
Outdated
| | ("mgd", "ddm-admin-client") | ||
| | ("mgd", "dpd-client") | ||
| | ("mgd", "gateway-client") | ||
| | ("omicron-sled-agent", "gateway-client") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ug. Every sled-agent also talks to both MGS instances. I'm not sure it strictly needs to, but would need to talk to the networking folks more to understand if we can take it out. This happens here:
omicron/sled-agent/src/bootstrap/early_networking.rs
Lines 282 to 291 in 3b27679
| info!( | |
| self.log, "Querying MGS to determine switch location"; | |
| "addr" => %addr, | |
| ); | |
| let switch_slot = mgs_client | |
| .sp_local_switch_id() | |
| .await | |
| .with_context(|| format!("Failed to query MGS at {addr}"))? | |
| .into_inner() | |
| .slot; |
The callers are:
- RSS, so it can pass a
HashMap<SwitchLocation, Ipv6Addr>to Nexus (seems like in principle Nexus could figure this out on its own?) - OPTE port creation, so it match up "which switch(es) has/have configured uplinks" with "what's the IP of the dendrite on that switch" (maybe we could always try to look up all dendrite instances regardless of configured uplinks, which means we wouldn't need to talk to MGS?)
934b040 to
5e04994
Compare
Fixes #9706.
This is a prototype.