I ran 3 WebArena evals (each eval on its own WebArena instance) on the same machine, each of them in its own tmux session and this seemed to have prevented the hanging tasks from being killed. I tested the same evals from different machines and did not experience problems with hanging tasks.
I did some reading and did find that having multiple ray instances running on the same machine can cause memory issues but the answer was not super clear.