-
Couldn't load subscription status.
- Fork 5.6k
Description
Description
Hello!
We are currently trying to update Saltstack to 3007.0.
This issue was tested on the commit ID: 31c9d0d
When using TCP transport to run something like salt-call grains.get fqdn, logs will contain two types of errors:
-
When TCP PubChannel is closed:
2024-05-21 20:57:18,572 [salt.channel.client:279 ][ERROR ][410050] Caught exception in PubChannel connect callback AttributeError("'NoneType' object has no attribute 'fire_event'") Traceback (most recent call last): File "/opt/salt/channel/client.py", line 542, in connect_callback self.event.fire_event({"master": self.opts["master"]}, "__master_connected") ^^^^^^^^^^^^^^^^^^^^^ -
When TCP ReqChannel (the one wrapped in SyncWrapper) is closed:
2024-05-21 20:57:19,893 [salt.channel.client:279 ][DEBUG ][410050] Closing AsyncReqChannel instance 2024-05-21 20:57:19,895 [salt.loader.lazy :279 ][PROFILE ][410050] LazyLoader utils deepcopy with deepcopy took 0.001188 2024-05-21 20:57:19,913 [asyncio :1821][ERROR ][410050] Task was destroyed but it is pending! task: <Task pending name='Task-7' coro=<RequestClient._stream_return() done, defined at /opt/salt/transport/tcp.py:1756> wait_for=<Future finished exception=StreamClosedError('Stream is closed')>>
Setup
This is shortened configuration, since we use patched custom installation and I'm not sure what I'm allowed to show:
master:
- salt-master.example.org
ipv6: true
transport: tcp
random_master: True
log_level: debug
Should you find it insufficient, please let me know.
- on-prem machine
- VM (Virtualbox, KVM, etc. please specify)
- VM running on a cloud service, please be explicit and add details
- container (Kubernetes, Docker, containerd, etc. please specify)
- or a combination, please be explicit
- jails if it is FreeBSD
- classic packaging
- onedir packaging
- used bootstrap to install
Steps to Reproduce the behavior
- Install salt-master 3007 on remote
- Install salt-minion 3007 locally
- Run
salt-call -l info grains.get fqdnon local machine
Running it you should see something akin to:
[ERROR ] Caught exception in PubChannel connect callback AttributeError("'NoneType' object has no attribute 'fire_event'")
Traceback (most recent call last):
File "/opt/salt/channel/client.py", line 542, in connect_callback
self.event.fire_event({"master": self.opts["master"]}, "__master_connected")
^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'fire_event'
...
local:
salt-minion.example.org
Task was destroyed but it is pending!
task: <Task pending name='Task-7' coro=<RequestClient._stream_return() running at /opt/salt/transport/tcp.py:1761> wait_for=<Future finished exception=StreamClosedError('Stream is closed')>>
Expected behavior
Closing channels and transports should properly wait on underlying streams before closing them fully.
Screenshots
Probably inapplicable.
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)Salt Version:
Salt: 3007.0
Python Version:
Python: 3.12.3 (main, May 13 2024, 10:19:24) [Clang 16.0.6 ]
Dependency Versions:
cffi: 1.16.0
cherrypy: Not Installed
dateutil: 2.9.0.post0
docker-py: 7.0.0
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.4
libgit2: Not Installed
looseversion: 1.3.0
M2Crypto: 0.38.0
Mako: 1.3.3
msgpack: 1.0.8
msgpack-pure: Not Installed
mysql-python: 1.4.6
packaging: 21.3
pycparser: 2.22
pycrypto: Not Installed
pycryptodome: Not Installed
pygit2: Not Installed
python-gnupg: 0.5.2
PyYAML: 5.4.1
PyZMQ: 25.1.2
relenv: Not Installed
smmap: 5.0.1
timelib: 0.3.0
Tornado: 6.4
ZMQ: 4.1.2
Salt Package Information:
Package Type: Not Installed
System Versions:
dist: ubuntu 22.04 jammy
locale: utf-8
machine: x86_64
release: 5.15.0-30-generic
system: Linux
version: Ubuntu 22.04 jammyAdditional context
I was able to trace examples of code lines, where the errors happen.
For PubChannel, it happens in SMinion during master evaluation. Closing channel will both close underlying transport and destroy event queue as seen here: https://github.com/saltstack/salt/blob/v3007.0/salt/channel/client.py#L463-L470
However connect_callback and, by extension, self.event.fire_event still seem to run, causing None.fire_event
Somewhat same thing happens with SyncWrapper, where, as seen in RemotePillar, it both calls close method on AsyncReqChannel and destroys io_loop passed to it. However, as seen in error, underlying TCP transport fails to process StreamClosedError in time, leaving it as a Task in closed io_loop
Technically it also happens in TCP transport, but try catch wrapper simply catches StreamClosedError and puts it out as debug message.