- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.1k
Description
What version of nebula are you using? (nebula -version)
1.9.6
What operating system are you using?
Fedora Linux
Describe the Bug
Nebula seems to fail to maintain direct connections using IPv4 if IPv6 appears available despite specifying an IPv4 subnet range in preferred ranges in the config file.
My setup is using Nebula to connect multiple virtual servers and some clients together, for convenience and security reasons the clients always use Nebula to communicate with the servers regardless of if they're locally connectable or roaming. The servers are behind a stateful firewall that blocks inbound and some outbound connections, server nodes punch out to allow Nebula connections. There are multiple lighthouse nodes, all of which can be directly connected without any firewall blocking access from server or client nodes (one on the LAN for HA purposes, 2 external nodes that all other nodes can reach).
On initial connection while running locally I get a direct connection with good performance and low latency, however after a period of time the connection seems to fail over to a relayed connection with ping times showing a latency double that of my ping to my external Lighthouse and a sudden performance degradation. This seems to take anywhere from seconds to multiple minutes. The only log messages I can see are msg="Failed to write outgoing packet" error="listener is IPv4, but writing to IPv6 remote", over and over. I'm not really sure why it's trying to use IPv6 given that all my nodes are configured to listen on IPv4 addresses only. Adding preferred_ranges: ["10.0.1.0/24"] (my LAN subnet) didn't help, both the server and client still attempt, and fail, to connect to IPv6 and then seemingly fail over onto a relayed connection instead of the known working IPv4 direct connection.
(Note I'm expecting this is to do with Nebula trying IPv6 connections when it isn't configured to listen for them because that's what's turning up in the logs, but I could be wrong and it could be that something else is forcing Nebula to switch to relayed connections. I can't really think of anything else though, if it was just difficulty punching out through the firewall why would it work directly at the start?)
Logs from affected hosts
The only error messages in the logs are one of the 2 below
level=error msg="Failed to send handshake message" error="listener is IPv4, but writing to IPv6 remote" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2853929362 localIndex=2853929362 remoteIndex=0 udpAddr="[<IPv6 address here>]:47473" vpnIp=y.y.y.y
level=error msg="Failed to write outgoing packet" certName=<node> error="listener is IPv4, but writing to IPv6 remote" localIndex=2500263824 remoteIndex=450316016 udpAddr="[<IPv6 address here>]:47473" vpnIp=y.y.y.y
Config files from affected hosts
I've snipped out irrelevant parts for brevity, I've also redacted specific IP addresses
static_host_map:
  "x.x.x.x": ["<public IPv4 address>:4242"]
  "x.x.x.x": ["<public IPv4 address>:4242"]
  "x.x.x.x": ["<LAN IPv4 address>:4242"]
lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
    - "x.x.x.x"
    - "x.x.x.x"
    - "x.x.x.x"
  local_allow_list:
    "10.89.0.0/16": false # Don't advertise Podman network addresses
listen:
  host: 0.0.0.0
  port: 0
  send_recv_error: private
punchy:
  punch: true
(only difference is here - server has respond: true set as well)
cipher: chachapoly
preferred_ranges: ["10.0.1.0/24"]
relay:
  relays:
    - x.x.x.x
    - x.x.x.x
    - x.x.x.x
  am_relay: false
  use_relays: true