Evaluating Jobs
Last updated on 2023-05-16 | Edit this page
Estimated time: 22 minutes
Overview
Questions
- How to evaluate a completed job?
- How to set event notification for your jobs?
Objectives
- Explain Slurm environment variables.
- Demonstrate how to evaluate jobs and make use of multiple threads options.
Evaluating your Job
After a job has completed, you will need to evaluate how efficient it was, if it ran successfully, or investigate why it failed.
The seff
command provides a summary of any job.
Exercise 1: Run and evaluate job4.sh .
job4.sh is similar to job3.sh with only the bowtie2
command. Try submitting it. Is there an error? How to fix it?
And after it completed successfully, evaluate the job.
The jobs completes fast but not successfully
OUTPUT
Job ID: 11793501
Cluster: milton
User/Group: iskander.j/allstaff
State: OUT_OF_MEMORY (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 00:00:01
CPU Efficiency: 50.00% of 00:00:02 core-walltime
Job Wall-clock time: 00:00:01
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 20.00 MB (10.00 MB/core)
Also, checking output
OUTPUT
.........................<other output>
slurmstepd: error: Detected 1 oom_kill event in StepId=11793501.batch. Some of the step tasks have been OOM Killed.
This shows that the job was “OOM Killed”. OOM is an abbreviation of Out Of Memory, meaning the memory requested was not enough, increase memory and try again until job finishes successfully.
Exercise 2: Run and evaluate job4.sh .
Now that the job works fine can we make it faster.
Slurm Environment Variables
Slurm passes information about the running job e.g what its working directory, or what nodes were allocated for it, to the job via environmental variables. In addition to being available to your job, these are also used by programs to set options like number of threads to run based on the cpus available.
The following is a list of commonly used variables that are set by Slurm for each job
-
$SLURM_JOBID
: Job id -
$SLURM_SUBMIT_DIR
: Submission directory -
$SLURM_SUBMIT_HOST
: Host submitted from -
$SLURM_JOB_NODELIST
: list of nodes where cores are allocated -
$SLURM_CPUS_PER_TASK
: number of cores per task allocated -
$SLURM_NTASKS
: number of tasks assigned to job
Exercise 3: Run job5.sh.
Can we make use on of Slurm environment variables in job4.sh?
use $SLURM_CPUS_PER_TASK
with -p
option
instead of setting a number.
Key Points
- Use
seff
to evaluate completed jobs - Slurm Environment variables are handy to use in your script