-
Notifications
You must be signed in to change notification settings - Fork 146
Description
On S/390 I ran into problem with the way the VXE2 hardware feature is currently being detected in multi-threaded code. It uses a SIGILL signal handler and sigjmp/longjmp in order to detect whether a certain instruction is available or not. Although I've looked into it for S/390 this should apply to other platforms using the same mechanism.
The problem is triggered by the way PyTorch is using Sleef:
pytorch/pytorch#128503
But it can also be reproduced with a small example like this:
#include <stdio.h>
#include "sleef.h"
#ifndef NUM_THREADS
#define NUM_THREADS 4
#endif
int main() {
__vector float out[NUM_THREADS];
#pragma omp parallel for num_threads(NUM_THREADS)
for (int i = 0; i < NUM_THREADS; i++)
out[i] = Sleef_expf4_u10 ((__vector float){1.0f + i, 2.0f + i , 3.0f + i, 4.0f + i});
for (int i = 0; i < NUM_THREADS;i++)
{
for (int j = 0; j < 4; j++)
printf("%6.3f ", out[i][j]);
printf("\n");
}
printf("\n");
}
Running the test like this:
gcc -DNUM_THREADS=4 t.c -O3 -mzvector -march=z15 -lsleef -fopenmp -lgomp -o t && ./t
results in either broken results or crashes
While the single threaded version works fine:
gcc -DNUM_THREADS=1 t.c -O3 -mzvector -march=z15 -lsleef -fopenmp -lgomp -o t && ./t
The cpuSupportsExt function uses the file scope variable sigjmp to store the execution status what makes this function thread-unsafe.
I will send a PR to check HWCAPs instead of using the signal handler. This fixes the problem for S/390. I think other platforms might need similar adjustments.