Skip to content

Commit 97f4fd4

Browse files
committed
add Python 3.14 to CI
1 parent 65e4052 commit 97f4fd4

File tree

3 files changed

+74
-59
lines changed

3 files changed

+74
-59
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ jobs:
1212
strategy:
1313
fail-fast: false
1414
matrix:
15-
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
15+
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
1616
os: [ubuntu-latest, windows-latest, macOS-latest]
1717

1818
steps:

duct.py

Lines changed: 29 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
import signal
5858
import subprocess
5959
import threading
60+
import time
6061

6162
try:
6263
from pathlib import PurePath
@@ -1249,61 +1250,36 @@ def __init__(self, *args, **kwargs):
12491250
# blocked on wait().
12501251
self._condvar = threading.Condition(threading.Lock())
12511252

1252-
def wait(self):
1253-
# Take the condvar/lock. If we need to do a blocking wait, we'll
1254-
# release it first.
1253+
def wait(self, timeout=None):
1254+
deadline = time.monotonic() + timeout if timeout is not None else None
1255+
# Start by taking the condvar/lock, but note that we need to release it
1256+
# before waiting, or else we'd block .poll(), .kill(), and other calls
1257+
# to .wait().
12551258
with self._condvar:
1256-
# As long as another thread is in a blocking wait, sleep on the
1257-
# condvar. Once we wake from this sleep, it's overwhelmingly likely
1258-
# that the child will have been reaped, but we'll still check in
1259-
# case waiting fails somehow.
1260-
while self._someone_is_waiting:
1261-
self._condvar.wait()
1262-
1263-
# Check whether the child has already been reaped. (The .poll()
1264-
# below actually does the same check internally, so we could skip
1265-
# this, but it feels clearer to me to check it.)
1266-
if self._child.returncode is not None:
1267-
return self._child.returncode
1268-
1269-
# We have the condvar/lock, and we're the thread responsible for
1270-
# waiting on the child. We're about to release the lock and do a
1271-
# blocking, non-reaping wait. However, there's another subtle race
1272-
# condition we need to worry about, apart from the kill/wait PID
1273-
# reuse race that this class is all about. Here's a possible order
1274-
# of events:
1275-
# 1. The child process exits.
1276-
# 2. Lots of time passes. Now any waiter will certainly see that
1277-
# the child has exited.
1278-
# 3. One thread calls .wait(). It acquires the lock, sets
1279-
# _someone_is_waiting to True, and releases the lock, in
1280-
# preparation for attempting its blocking wait.
1281-
# 4. Suddently, another thread swoops in and calls .poll().
1282-
# That thread acquires the lock, sees _someone_is_waiting, and
1283-
# returns None. ***THIS IS A BUG.***
1284-
# 5. The first thread sees that the child has exited, reacquires
1285-
# the lock, and cleans up.
1286-
# If the call to .poll() were racing against the child's actual
1287-
# exit, then we wouldn't care what it returned. That would be an
1288-
# "honest" race, and it would be correct for the result to be a
1289-
# coin flip. But that's not what happens in this situation. Here
1290-
# the child has definitely exited, maybe seconds or minutes ago,
1291-
# and a single call to .poll() would certainly have returned
1292-
# non-None. It's only by racing against .wait() that this situation
1293-
# could incorrectly report that the child hasn't exited.
1294-
#
1295-
# To fix this, .wait() must do a non-blocking wait (that is,
1296-
# actually check the status of the child process) *before*
1297-
# releasing the lock. (In fact, doing it before acquiring the lock
1298-
# could also be correct.) If that returns false, then the only
1299-
# possible race is a race against the child itself, where again
1300-
# it's expected and fine for the result to be a coin flip.
1301-
if self._child.poll() is not None:
1302-
return self._child.returncode
1259+
while True:
1260+
now = time.monotonic()
1261+
if self._child.returncode is not None:
1262+
# The child has already been reaped. Return its saved exit
1263+
# status.
1264+
return self._child.returncode
1265+
if deadline is not None and time.monotonic() > deadline:
1266+
# The deadline has passed. Poll the child on the way out,
1267+
# to make sure we always poll at least once.
1268+
return self.poll()
1269+
if self._someone_is_waiting:
1270+
# There is another blocking waiter. Sleep until it signals
1271+
# the condvar or the deadline passes. Spurious wakeups are
1272+
# acceptable here.
1273+
condvar_timeout = deadline - now if deadline is not None else None
1274+
self._condvar.wait(condvar_timeout)
1275+
else:
1276+
# There are no other blocking waiters. Proceed to the
1277+
# blocking wait.
1278+
break
13031279

1304-
# Finally, before releasing the condvar/lock, set
1305-
# _someone_is_waiting. We must unset it before returning, or else
1306-
# we'll deadlock other callers.
1280+
# We are the blocking waiter. Set the _someone_is_waiting flag and
1281+
# release the lock (dedent) before blocking. After this, we must
1282+
# clear this flag and notify the condvar before returning.
13071283
self._someone_is_waiting = True
13081284

13091285
# Dedent to release the condvar/lock, and do the blocking wait. We have

gotchas.md

Lines changed: 44 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ affect the implementation.
2626
* [Matching platform case-sensitivity for environment variables](#matching-platform-case-sensitivity-for-environment-variables)
2727
* [Using IO threads to avoid blocking children](#using-io-threads-to-avoid-blocking-children)
2828
* [Killing grandchild processes?](#killing-grandchild-processes)
29+
* [Waiting with a timeout](#waiting-with-a-timeout)
2930

3031
## Reporting errors by default
3132

@@ -56,9 +57,10 @@ reading all of its input. Most standard libraries get this right.
5657

5758
Notably on Unix, this requires the process to suppress `SIGPIPE`.
5859
Implementations in languages that don't suppress `SIGPIPE` by default (C/C++?)
59-
have no choice but to set a signal handler from library code, which might
60-
conflict with application code or other libraries. There is no good solution to
61-
this problem.
60+
have to configure signal handling from library code, which might conflict with
61+
application code or other libraries in the rare case that something does want
62+
to receive that signal. (See [Waiting with a timeout](#waiting-with-a-timeout)
63+
below for more on handling signals from library code.)
6264

6365
## Cleaning up zombie children
6466

@@ -100,8 +102,7 @@ PID. It's not likely, but all of that could happen just before the call to
100102
is why the Rust standard library [doesn't allow shared access to child
101103
processes](https://doc.rust-lang.org/std/process/struct.Child.html#method.kill).
102104

103-
It's possible to avoid this race using a newer POSIX API called
104-
[`waitid`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html).
105+
It's possible to avoid this race using a newer POSIX API called [`waitid`].
105106
That function has a `WNOWAIT` flag that leaves the child in its zombie state,
106107
so that its PID isn't freed for reuse. That gives the waiting thread a chance
107108
to set a flag to block further kills, before reaping the child. Duct uses this
@@ -306,3 +307,41 @@ objects](https://docs.microsoft.com/en-us/windows/win32/procthread/job-objects))
306307
but even there it sounds like some important features aren't supported on
307308
Windows 7. Realistically, there won't be good techniques for Duct to use to
308309
solve this problem for many years.
310+
311+
## Waiting with a timeout
312+
313+
The Windows [`WaitForSingleObject`] function has a timeout argument, but the
314+
Unix [`waitpid`], [`waitid`], and [`pthread_join`] functions do not. That makes
315+
it complicated to do any sort of waiting with a timeout on Unix.
316+
317+
- for threads we can add code
318+
- what does Python do?
319+
- PyEvent on thread exit
320+
- also their locks are actually all condvars on the inside and support
321+
waiting with a timeout
322+
- for children we need to handle SIGCHLD
323+
- what does Python do? (timeout?)
324+
- OMG does Python have the wait/try_wait race?!
325+
- signal_hook_registry race condition
326+
- can we just have sigaction() write back to the global?
327+
- does Rust Child::wait fail to handle eintr?
328+
329+
330+
I want try_wait/poll to always actually check, because you might be calling it
331+
in response to SIGCHLD or something like that, and in that case it's
332+
unacceptable for a race against the blocking thread to cause you to return
333+
None.
334+
335+
That means that all reaping should actually be done by calling into wait().
336+
337+
That means that double locking is viable, like the old Python implementation.
338+
Maybe we don't need a condvar?
339+
- But what does that mean for timeouts? What's cleaner? Unix has SIGCHLD, and
340+
Windows doesn't have to worry about reaping, but what would we need to do if
341+
there was a timeout parameter to the Unix waitid? In that case we would need
342+
the condvar.
343+
344+
[`WaitForSingleObject`]: https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject
345+
[`waitpid`]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html
346+
[`waitid`]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html
347+
[`pthread_join`]: https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_join.html

0 commit comments

Comments
 (0)