Scripts and config files for SLURM. This repo accompanies the blog post in https://mtreviso.github.io/blog/slurm.html
SLURM Documentation: https://slurm.schedmd.com/documentation.html
Starting from the controller node:
sudo apt install slurmd slurmctld
For the computer nodes:
sudo apt install slurmd slurm-client
The easiest way is to copy the slurm.conf file and edit what you want, and then save it in /etc/slurm/slurm.conf. The key parts are:
- ClusterName: sardine-cluster
- SlurmctldHost: artemis (the info obtained using
hostname -f) - NodeName: artemis (the info obtained using
hostname -f)
Add sardine account:
sudo sacctmgr add cluster sardine-cluster
Add QOSs:
sudo sacctmgr add qos cpu set priority=10 MaxJobsPerUser=4 MaxTRESPerUser=cpu=32,mem=128G,gres/gpu=0;
sudo sacctmgr add qos gpu-debug set priority=10 MaxJobsPerUser=1 MaxTRESPerUser=gres/gpu=8 MaxWallDurationPerJob=01:00:00;
sudo sacctmgr add qos gpu-short set priority=10 MaxJobsPerUser=2 MaxTRESPerUser=gres/gpu=4 MaxWallDurationPerJob=04:00:00;
sudo sacctmgr add qos gpu-medium set priority=5 MaxJobsPerUser=1 MaxTRESPerUser=gres/gpu=4 MaxWallDurationPerJob=48:00:00;
sudo sacctmgr add qos gpu-long set priority=2 MaxJobsPerUser=2 MaxTRESPerUser=gres/gpu=2 MaxWallDurationPerJob=168:00:00;
sudo sacctmgr add qos gpu-hero set priority=100 MaxJobsPerUser=8 MaxTRESPerUser=gres/gpu=8;
Add users with QOS (for example):
sudo sacctmgr create user --immediate name=mtreviso account=sardine QOS=cpu,gpu-debug,gpu-short,gpu-medium,gpu-long;
Simply copy psqueue.py and psinfo.py to /usr/local/bin and give +x permission.
sudo cp psqueue.py /usr/local/bin/psqueue
sudo chmod +x /usr/local/bin/psqueue
sudo cp psinfo.py /usr/local/bin/psinfo
sudo chmod +x /usr/local/bin/psinfo
By default, the commands will try to use rich if it is installed. Otherwise, it will display a table in plain text.
- PS: Both commands accept the
--plainflag to force a plain text output.
If people want to get the rich interface, tell them to install rich locally using system's pip:
pip install --user rich
Add this to their .bashrc:
export PATH="$HOME/.local/bin:$PATH"