Running a Job on the HEC

By: Henry Oldroyd

So you've successfully run a Jupyter notebook on the HEX. Now you've debugged it you may want to run in on the more powerful HEC. Here is a Guide for how to submit it as a batch job. This guide was based on this HEC page

Follow previous guide for getting through SSH to the HEC #

Running Hello World Job #

fist connect by SSH in command line:

ssh -X username@wayland-2022.hec.lancaster.ac.uk
(enter password)

wayland-2022%

Lets explore the file system:

wayland-2022% pwd
/home/hpc/04/username
wayland-2022% cd /storage/hpc/04/username
wayland-2022% cd /scratch/hpc/04/username
wayland-2022% cd /tmp
wayland-2022% cd /home/hpc/04/username
wayland-2022%

You can see the directories here of each of types of storage available on the HEC: See File Store

File Area Quota Backup Policy File Retention Policy
home 10G Nightly for 90 days For the lifetime of the account
storage 200G None For the lifetime of the account
scratch 10T None Files automatically deleted after 4 weeks
temp Unlimited None Files automatically deleted at the end of the job

Then create a new python file to print a message:

See this HEC page for linux help

Then create a HEC job file to run this python program

wayland-2022% nano hello_world.py

(Then type in the editor)

print('Hello from the HEC')

You can then see the file contents with

wayland-2022% cat hello_world.py
print('Hello from the HEC')
wayland-2022%

Now you need to make a job file to run this python program The HEC uses a SLURM based system for scheduling batch jobs. You define the job in a .com file using the SLURM syntax Use this template to run the hello world job

Repeat the above steps to create a file called hello_job.com

#!/bin/bash 
#SBATCH -p serial
#SBATCH -J helloTest


echo Job running on compute node `uname -n`

python3 /home/hpc/04/oldroydh/hello_world.py

Then use this console command to submit the job:

sbatch hello_job.com

Then you can use the command

squeue --me

to see the job you have submitted. (If you don't see anything it may have already finished)

wayland-2022% sbatch hello_job.com
Submitted batch job 9775217
wayland-2022% squeue --me
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           9775217    serial helloTes oldroydh  R       0:01      1 comp10-09
wayland-2022% ls helloTest*
helloTest.9775217.err  helloTest.9775217.out
wayland-2022%

The number given is the job id. You should then notice that there are some new files in your home directory.

wayland-2022% cat helloTest.9775217.err
wayland-2022% cat helloTest.9775217.out
Job running on compute node comp10-09
Hello from the HEC
wayland-2022%

The .err file is empty as there were no errors in the job. The .out file shows the jobs output which are as expected

(add details about how to execute a notebook with tools like https://github.com/jupyter/nbconvert)