Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Slurm Job Scheduling

IST cluster uses Slurm to schedule job and manage policy in our cluster, e.g. the number of jobs/time/resource (Nodes, CPUs, GPUs) per user, scheduling policies.


Slurm Overview

Basically, Slurm is a opensource jobs scheduling which is reponsible for allocating resource for each job, prioritizing user jobs and montioring user’s jobs.

To send job in to slurm, users need to define jobs setting such as Account, Parition, Memory, CPU Cores, etc. Then, Slurm will allocate resource for jobs and execute user’s jobs. We list the partition and limitation for each jobs in cluster policy section

There are 2 ways to execute jobs(your script) via Slurm.

  1. sbatch command submit your script to partition. ex. sbatch helloworld.sub
  2. srun: Run parallel jobs

We create a sinteractive command: Run an interactive session which is easier for testing program.

Slurm Terminologies

  • Account: an account in Slurm system. One user can have many accounts.
  • User: user in Linux system.
  • Partition: A job’s queue. The jobs from users will be queued and executed consecutively.
  • Time: Time limits for job.
  • Hardware Specification: the number of resource in each nodes such as nodes, cores, GPUs,and memory.
  • Quality of Service (QoS): The QoS helps to limit the hardware resource/time for each account or group of account.

Example: Hello World

In this example, we will run helloworld.py using sbatch and sinteractive.

## helloworld.py
#!/bin/python
print "Hello Worldddd"

View available account and partitions in your user

[songpon@ist-frontend-001 ~]$ myassoc
   Account                  QOS   Def QOS  Partition
---------- -------------------- --------- ----------
   scads                                        cpu
   scads                                        dgx
   scads                                   bash-cpu
   scads                                   bash-dgx

SBATCH

define resource in sbatch. (helloworld.sub)

#!/bin/bash -l
#SBATCH --mem=50mb
#SBATCH --nodes=1
#SBATCH --partition=cpu
#SBATCH --account=scads
srun python helloworld.py

Submit job to queue

sbatch helloworld.sub

See output

[songpon@ist-frontend-001 test-job]$ ls
hello.sub  helloworld.py  output  slurm-1667.out  test.sh
[songpon@ist-frontend-001 test-job]$ cat slurm-1667.out
Hello Worldddd

SINTERACTIVE

define resource in sinteractive arguments which is the same argument for srun more information

sinteractive -A scads -p bash-cpu --mem=1gb -c 1 -N 1 
python helloworld.py