trigger interrupts after long vector operations #994
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here's a program that should not need too much memory at its peak. The program allocates a 4 MB string and the appends to it 1000 times, but each appended string is immediately discarded. Nevertheless, running the problem on a 64-bit platform before this PR will hit peak memory use of 1.5 GB or so:
The problem is that the implementation of
string-appendperforms each 4 MB allocation as an atomic-seeming kernel step, including a copy viamemcpy— as opposed to a loop in Scheme where the trap register would be decremented every time around the loop. In other words, a large amount of work is done, but it is treated as effectively constant for the purposes of deciding when to fire interrupts, including GC interrupts. Thevector-appendoperation does not usememcpy, but it uses a hand-code loop that (before this PR) similarly dis not adjust the trap register. Operations that don't allocate, such asbytevector-fill!won't create GC trouble, but infrequent timer-interrupt checking can interfere with using timers/engines for coroutines.So, for operations that are atomic from the perspective of interrupts but that may work on large objects, such as
vector-append, the change here adjusts the trap counter proportional to work done. That way, interrupts are dispatched in a more timely manner, especially GC interrupts.(The change to "7.ms" is unrelated. Wrapping that test with its smaller list size in a loop could provoke a failure before these changes.)
There should be a runtime cost, but it is small. The
string-appendfunction turns out to sometimes run a little faster on small strings, but that's because becausememcpyis now called via an__atomicforeign procedure. I've observed a slowdown as large as 10% for fast operations like(#3%vector-set/copy '#(1) 0 1)on x86_64, but the same example has 0% difference on AArch64, and generally the differences are in the noise.Unsafe list operations like
#3%lengthor#3%memqhave the same issue, but since it's only the unsafe versions of those functions (safe versions oflengthandmemqare normal Scheme code), and since those operations tend not to have an overall length straightforwardly available (except in the case of a#3%lengthresult), there's no attempt to adjust those here.Setting aside unsafe list operations, I'm not sure this commit covers all relevant operations. The "4.ms" changes show the ones that I found.