Skip to content

Feature detection is not thread-safe #560

@Andreas-Krebbel

Description

@Andreas-Krebbel

On S/390 I ran into problem with the way the VXE2 hardware feature is currently being detected in multi-threaded code. It uses a SIGILL signal handler and sigjmp/longjmp in order to detect whether a certain instruction is available or not. Although I've looked into it for S/390 this should apply to other platforms using the same mechanism.

The problem is triggered by the way PyTorch is using Sleef:
pytorch/pytorch#128503

But it can also be reproduced with a small example like this:

#include <stdio.h>
#include "sleef.h"

#ifndef NUM_THREADS
#define NUM_THREADS 4
#endif

int main() {
  __vector float out[NUM_THREADS];

#pragma omp parallel for num_threads(NUM_THREADS)
  for (int i = 0; i < NUM_THREADS; i++)
    out[i] = Sleef_expf4_u10 ((__vector float){1.0f + i, 2.0f + i , 3.0f + i, 4.0f + i});

  for (int i = 0; i < NUM_THREADS;i++)
    {
      for (int j = 0; j < 4; j++)
	printf("%6.3f ", out[i][j]);
      printf("\n");
    }
  printf("\n");
}

Running the test like this:
gcc -DNUM_THREADS=4 t.c -O3 -mzvector -march=z15 -lsleef -fopenmp -lgomp -o t && ./t
results in either broken results or crashes

While the single threaded version works fine:
gcc -DNUM_THREADS=1 t.c -O3 -mzvector -march=z15 -lsleef -fopenmp -lgomp -o t && ./t

The cpuSupportsExt function uses the file scope variable sigjmp to store the execution status what makes this function thread-unsafe.

I will send a PR to check HWCAPs instead of using the signal handler. This fixes the problem for S/390. I think other platforms might need similar adjustments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions