What is a HPC?
- Using High Performance Computing (HPC) typically involves connecting to very large computing systems that provides a high computational power.
- These systems can be used to do work that would either be impossible or much slower on smaller systems.
- HPC resources are shared by multiple users.
- The resources found on independent compute nodes can vary in volume and type (amount of RAM, processor architecture, availability of shared filesystems, etc.).
- The standard method of interacting with HPC systems is via a command line interface.
Accessing Milton
- HPC systems typically provide login nodes and a set of compute nodes.
- Files saved on one node are available on all nodes.
- Milton has multiple different file systems that have different policies and characteristics.
- Throughout a research project, research data may move between file systems according to backup and retention requirements, and to improve performance.
Environment Modules
- Load software with
module load softwareName
. - Unload software with
module unload
ormodule purge
. - The module system handles software versioning and package conflicts for you automatically.
Lunch Break
Introducing Slurm
- The scheduler handles how compute resources are shared between users.
- A job is just a shell script.
- Request slightly more resources than you will need.
- Backfilling improves system utilisation and maximises job throughput. You can take advantage of backfilling by requesting only what you need.
- Milton Slurm has multiple partitions with different specification that fit the different types of jobs.
Submitting a Job
-
sbatch
is used to submit the job -
squeue
is used to list jobs in the Slurm queue- passing the
-u <username>
option will show jobs for just that user.
- passing the
-
sacct
is used to show job details -
#SBATCH
directives are used in submission scripts to set Slurm directives - Setting up job resources is a challenge and you might not get the first time
Evaluating Jobs
- Use
seff
to evaluate completed jobs - Slurm Environment variables are handy to use in your script
Break
Slurm Commands
- Slurm commands are handy to view information about queued jobs, nodes and partitions
- You will commonly use
sbatch
,squeue
,salloc
,sinfo
andsacct
Interactive Slurm Jobs
- Use
salloc
to start a new interactive Slurm job on Milton. - Use
--x11
withsalloc
to run remote graphics in your interactive job.