Skip to content

Conversation

@zeeshanlakhani
Copy link
Contributor

Fixes #107, which now properly passes.

This adds VLAN-aware NAT ingress matching to actually prevent cross VLAN translation. Previously, a packet arriving with VLAN 100 destined to a multicast group configured for VLAN 200 would be NAT encapsulated and forwarded, effectively translating the packet to the wrong customer's network.

Includes:

  • NAT ingress table matching (mcast_nat.rs, mod.rs):

    • Add Ipv4VlanMatchKey and Ipv6VlanMatchKey that match on destination address, VLAN header validity, and VLAN ID
    • For groups with VLAN, install two entries: untagged (for decapsulated Geneve from underlay) and correctly tagged (for customer packets)
    • Packets with the wrong VLAN miss both entries and are not NAT encapsulated
  • Multicast router VLAN handling (sidecar.p4):

    • Strip incoming VLAN tag before routing lookup in MulticastRouter4/6
    • forward_vlan action re-adds the group's configured VLAN on egress
    • Prevents unintended VLAN translation at the routing stage
  • Rollback changes:

    • Remove dead NAT rollback branches for internal groups (no NAT entries)
    • Add rollback support for VLAN changes in NAT and route tables
  • Counter(s):

    • The underlay multicast counter condition was unreachable for packets tagged MULTICAST_TAG_UNDERLAY_EXTERNAL that weren't decapped. The check for == MULTICAST_TAG_UNDERLAY excluded these packets, causing them to fall through to the external counter.

zeeshanlakhani and others added 20 commits December 1, 2025 01:43
This PR is a precursor to a follow-up PR that leverages
updated code in `omicron-common`. This gets dendrite
in line with Omicron `main`, capturing an upstream
type change that we needed to accomodate.
Previously, internal multicast groups accepted admin-scoped addresses
including admin-local (ff04), site-local (ff05), and org-local (ff08).
This narrows the scope to only admin-local (ff04::/16), which is what
Omicron *now* dictates.

- [ ] This should be merged after
    oxidecomputer/omicron#9450 is reviewed
    and merged into Omicron. We now make Dendrite/Dpd match Omicron
    consistently for validation.

Key changes:
  - Remove IPV6_SITE_LOCAL_PATTERN and IPV6_ORG_SCOPE_PATTERN from P4
  - Update P4 table entries to only match admin-local (size 4→2)
  - Add ADMIN_LOCAL_PREFIX const to dpd-types with RFC doc links
  - Update validation to use `is_admin_local_multicast()` from oxnet v0.1.4
  - Bump to API version 2 for doc changes (only)
  - Update README with OpenAPI generation instructions
  - Use new multicast subnet constants from `omicron-common` for validation
This diffset handles our move toward ASM groups being able to have source
IPs.

Includes:

- Replace IpSrc::Subnet with IpSrc::Any for any-source multicast filtering
- Add source filter normalization: when Any is present, collapse to single
  /0 entry; empty sources treated as allow-any
- API versioning
  - Add tag ownership validation for group updates and deletes (v4 API)
  - Add v2/v3 API version adapters for backward compatibility
- Fix test_service_ipv4_unknown_address to set NAT-only on correct port
  - #172
  - I had to fix this here, as it was failing consistently locally.
…specification

Multicast groups now require tag-based validation for mutations (delete/update)
in API v4 (current). Tags are assigned at creation (user-provided or
auto-generated as `{uuid}:{group_ip}`) and are immutable. This prevents
accidental modification or deletion of groups associated with other components,
and better works with Omicron's retry model.

This commit also addresses several issues with this branch's multicast API
versioning scheme, including restructuring versions on this branch to
follow this ordering:
  - v4 = TAG_OWNERSHIP: required tags for delete/update as query params, IpSrc::Any
  - v3 = SOURCE_FILTER_ANY: optional tags, IpSrc::Any
  - (MCAST_DOCS_ADMIN_LOCAL was removed)
  - v2 = DUAL_STACK_NAT_WORKFLOW: optional tags, IpSrc::Subnet
  - v1 = INITIAL

API version 4 changes:
  - DELETE requires tag query parameter that must match the group's tag
  - PUT validates that the provided tag matches the existing group's tag
  - All response types have `tag: String` (always present, never null)

Backward compatibility (v1-v3):
  - DELETE does not require tag (handler looks up existing tag internally)
  - PUT tag is optional; if omitted, existing tag is preserved
  - Response types have `tag: Option<String>` for v3, converted from v4

Tag format: 1-80 ASCII bytes, alphanumeric plus hyphens, underscores,
colons, and periods. Constraint matches Omicron's database schema
(post-update there).
…4::/16

As per Omicron, customers can use admin-local IPv6 multicast addresses
(ff04::/16) for external groups, except for the reserved underlay subnet
(ff04::/64) which is used for internal underlay multicast allocation.

Changes:
  - switch external-group validation from validate_not_admin_local_ipv6()
    to validate_not_underlay_subnet()
  - validate_not_underlay_subnet() now only rejects ff04::/64
  - validate_nat_target() now requires NAT IPs in UNDERLAY_MULTICAST_SUBNET
  - VLAN tagging logic uses UNDERLAY_MULTICAST_SUBNET,
    removed dead is_unique_local() check (multicast IPs can't be ULA)
  - removed unused IPV6_SCOPE_MASK, IPV6_ULA_MASK/PATTERN from P4
  - removed dead ULA entry from mcast_tag_check table (const size = 1)
  - Updated integration test assertions, added unit tests
API v4 (MCAST_STRICT_UNDERLAY) changes:
  - Renamed AdminScopedIpv6 to UnderlayMulticastIpv6 to better reflect its
    purpose as the underlay multicast subnet type
  - Tightened validation from ff04::/16 (admin-local scope) to ff04::/64
    to match Omicron's UNDERLAY_MULTICAST_SUBNET allocation
  - Tag validation now required for update/delete operations (as before)

API v3 backward compatibility:
  - Added v3::AdminScopedIpv6 type that accepts the broader ff04::/16 range
  - v3 endpoints use v3::MulticastUnderlayGroupIpParam and convert to the
    v4 type with appropriate error handling for out-of-range addresses
Fixes #107, which now
properly passes.

This adds VLAN-aware NAT ingress matching to *actually* prevent cross
VLAN translation. Previously, a packet arriving with VLAN 100 destined to a
multicast group configured for VLAN 200 would be NAT encapsulated and
forwarded, effectively translating the packet to the wrong customer's
network.

Includes:
  - NAT ingress table matching (mcast_nat.rs, mod.rs):
    - Add Ipv4VlanMatchKey and Ipv6VlanMatchKey that match on destination
      address, VLAN header validity, and VLAN ID
    - For groups with VLAN, install two entries: untagged (for decapsulated
      Geneve from underlay) and correctly tagged (for customer packets)
    - Packets with wrong VLAN miss both entries and are not NAT encapsulated

  - Multicast router VLAN handling (sidecar.p4):
    - Strip incoming VLAN tag before routing lookup in MulticastRouter4/6
    - forward_vlan action re-adds the group's configured VLAN on egress
    - Prevents unintended VLAN translation at the routing stage

  - Rollback changes:
    - Remove dead NAT rollback branches for internal groups (no NAT entries)
    - Add rollback support for VLAN changes in NAT and route tables

  - Counter(s):
    - The underlay multicast counter condition was unreachable for packets
      tagged MULTICAST_TAG_UNDERLAY_EXTERNAL that weren't decapped. The check
      for `== MULTICAST_TAG_UNDERLAY` excluded these packets, causing them to
      fall through to the external counter.
@zeeshanlakhani zeeshanlakhani self-assigned this Jan 20, 2026
@zeeshanlakhani zeeshanlakhani deleted the zl/vlan-translation-107 branch January 22, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants