Skip to content

fstrm causing the process to block on a "stuck" socket #69

@borjam

Description

@borjam

I discovered this by testing bind 9.16 and 9.18 with dnstap.

In particular I was wondering about the scenario in which the named daemon would be feeding dnstap data to a misbehaving dnstap server process via a Unix domain socket.

So I created a dnstap server process and, after named connected to the socket, I sent it a STOP signal so that it would stop consuming data from the socket.

After running a benchark I saw it wasn't affecting named (it didn't stop serving queries, etc) but when trying to do a clean shutdown of the named process (either via rndc stop or a kill -TERM) it turns out the named process hungs until I either kill the dnstap server process or I resume it with a KILL -CONT.

I filed a bug for bind, [https://gitlab.isc.org/isc-projects/bind9/-/issues/3382] and after verifying the issue they point to a bug in fstrm. Copying from their response, "Some quick analyses show that fstrm blocks on read(2), so fstrm_iothr_destroy() hangs while waiting for pthread_join() to complete."

The bug report on ISCs Gitlaba contains a crude ktrace done on FreeBSD and a simple program they wrote to check the issue on fstrm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions