-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Running EESSI on the Canadian cluster Narval with srun I see these warnings (not fatal):
$ srun --mpi=pmix ./hellompi
...
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
PMIx stopped checking at the first component that it did not find.
Host: nc11103.narval.calcul.quebec
Framework: psec
Component: munge
...
Hello world from processor nc11104.narval.calcul.quebec, rank 3 out of 4 processors
Hello world from processor nc11103.narval.calcul.quebec, rank 0 out of 4 processors
Hello world from processor nc11103.narval.calcul.quebec, rank 1 out of 4 processors
Hello world from processor nc11104.narval.calcul.quebec, rank 2 out of 4 processors
This apparently happens when libpmix that Slurm uses has been compiled with munge but the module has not (caveat: I don't know what happens if it's the other way around i.e. system pmix without munge but module pmix with munge...)
this can be silenced using
export PMIX_MCA_psec=^munge
but it may be better to add libmunge to the compatibility layer and rebuild PMIx so it's picked up? An example ebuild we are using ourselves is here:
https://github.com/ComputeCanada/gentoo-overlay/blob/main/sys-auth/munge/munge-0.5.15.ebuild
Of course libmunge could also be provided by a module.
Metadata
Metadata
Assignees
Labels
No labels