Job Control
Controlling Jobs
Job control Command
Explanation
squeue
Squeue is used to view job and job step information for jobs managed by SLURM.
scontrol show node
shows detailed information about compute nodes.
scontrol show partition <partition Name>
shows detailed information about a specific partition/queue
scontrol show job <job ID>
shows detailed information about a specific job or all jobs if no job id is given.
sinfo
view information about slurm nodes and partitions/queues.
scancel <job ID>
Kill a job. Users can kill their own jobs, root can kill any job.
scontrol hold <job ID>
Hold a job
scontrol release <job ID>
Release a job:
sbalance
Check available account balance
**User's can check their /home and /scratch quota using "myquota" command
Sample Command Outputs Given Below:
List jobs
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
106 standard slurm-jo user1 R 0:04 1 atom01
Get job details
$ scontrol show job 106
JobId=106 Name=slurm-job.sh
UserId=user1(1001) GroupId=user1(1001)
Priority=4294901717 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:00:07 TimeLimit=14-00:00:0 TimeMin=N/A
SubmitTime=2013-01-26T12:55:02 EligibleTime=2013-01-26T12:55:02
StartTime=2013-01-26T12:55:02 EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=standard AllocNode:Sid=atom-head1:3526
ReqNodeList=(null) ExcNodeList=(null)
NodeList=atom01
BatchHost=atom01
NumNodes=1 NumCPUs=2 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=0 Contiguous=0 Licenses=(null) Network=(null)
Command=/home/user1/slurm/local/slurm-job.sh
WorkDir=/home/user1/slurm/local
Kill a job. Users can kill their own jobs, root can kill any job.
$ scancel 135
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
Hold a job:
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
139 standard simple user1 PD 0:00 1 (Dependency)
138 standard simple user1 R 0:16 1 atom01
$ scontrol hold 139
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
139 standard simple user1 PD 0:00 1 (JobHeldUser)
138 standard simple user1 R 0:32 1 atom01
Release a job:
$ scontrol release 139
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
139 standard simple user1 PD 0:00 1 (Dependency)
138 standard simple user1 R 0:46 1 atom01
To view the available Partition/Queues and Node status
$ sinfo –s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
standard up 3-00:00:00 32/356/54/442 cn[001-384],gpu[001-022],hm[001-036]
gpu up 3-00:00:00 0/21/1/22 gpu[001-022]
hm up 3-00:00:00 0/35/1/36 hm[001-036]
standard-low* up 3-00:00:00 32/356/54/442 cn[001-384],gpu[001-022],hm[001-036]