The KU Community Cluster uses SLURM (Simple Linux Utility for Resource Management) for managing job scheudling.

Connecting

Step-by-Step
Step-by-step instructions on how to connect

​The cluster uses your KU Online ID and password.

  • SSH: Use a SSH2 client to connect to hpc.crc.ku.edu for example:
    ssh username@hpc.crc.ku.edu
    

    Replace username with your KU Online ID, and then authenticate with your KU Online ID password. Alternatively, you can set up public-key authentication. SSH connections to hpc.crc.ku.edu resolve to either of the following login nodes:

    submit1.hpc.crc.ku.edu
    submit2.hpc.crc.ku.edu
    
  • X2Go: X2Go is software which allows you to access the cluster using a graphical desktop window. This allows you to open GUI applications such as MATLAB on the cluster.

Campus

If you are connecting from any of the University of Kansas' campuses, you may connect as the instructions above show.

Off-Campus

If you wish to connect the KU Community Cluster from off campus, you must connect through KU Anywhere. If you have multiple VPN Entitlements, any one of them will work. After successful connection, you may connect as instructed above.

Submitting Jobs

Maximum number of jobs
The maximum number of jobs a user can have submitted at one time is 5000
  • Batch jobs:To run a job in batch mode, use your favorite text editor to create a file which has SLURM options and also instructions on how to run your job, called a submission script. All SLURM options are prefaced with #SBATCH. It is necessary to specify the partition you wish to run in. After your script is complete, you can submit the job to the cluster with command sbatch.

    A submission script is simply a text file that contains your job parameters and the commands you wish to execute as part of your job. You can also load modules, set environmental variables, or other tasks inside your submission script.

    sbatch example.sh

    You may also submit simple jobs from the command line

    srun --partition=sixhour echo Hello World!

    Command-line options
    Command-line options will override SLURM options in your job script.
    
  • Interactive jobs: An interactive job allows you to open a shell on the compute node as if you had ssh'd into it. It is usually used for debugging purposes.

    To submit an interactive job, use the srun. Again, you must specify which --partition you wish your job to run in.

    srun --time=4:00:00 --ntasks=1 --nodes=1 --partition=sixhour --pty /bin/bash -l

    In the example above, the job has requested:

    • --time=4:00:00 4 hours for the job run
    • --ntasks=1 1 task. By default, 1 core is given to each task.
    • --nodes=1 1 node
    • --partition=sixhour Job to run in sixhour partition
    • --pty /bin/bash Interactive terminal running /bin/bash shell.
    • The --time, --ntasks, --nodes are called options.

    If you have ssh'd to the submit nodes with X11 forwarding enabled and wish to have X11 for an interactive job, then supply the --x11 flag

    srun --time=4:00:00 --ntasks=4 --nodes=1 --partition=sixhour --x11 --pty /bin/bash -l

Submission Script

To run a job in batch mode on a high-performance computing system using SLURM, first prepare a job script that specifies the application you want to run and the resources required to run it, and then submit the script to SLURM using the sbatch command.

A very basic job script might contain just a bash or tcsh shell script. However, SLURM job scripts most commonly contain at least one executable command preceded by a list of options that specify resources and other attributes needed to execute the command (e.g., wall-clock time, the number of nodes and processors, and filenames for job output and errors). These options prefaced with the #SBATCH instruction, which should precede any executable lines in your job script.

Additionally, your SLURM job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.

Default Options
If no SLURM options are given, default options are applied.

Tasks / Cores

Slurm is very explicit in how one requests cores and nodes. While extremely powerful, the three flags, --nodes, --ntasks, and --cpus-per-task can be a bit confusing at first.

The term task in this context can be thought of as a process. Therefore, a multi-process program (e.g. MPI) is comprised of multiple tasks. And a multi-threaded program is comprised of a single task, which can in turn use multiple CPUs. In SLURM, tasks are requested with the --ntasks flag. CPUs, for the multithreaded programs, are requested with the --cpus-per-task flag.

Single Core Job

The --mem option can be used to request the appropriate amount of memory for your job. Please make sure to test your application and set this value to a reasonable number based on actual memory use. The %j in the --output line tells SLURM to substitute the job ID in the name of the output file. You can also add --error with an error file name to separate output and error logs.

#!/bin/bash
#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --partition=sixhour           # Partition Name (Required)
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ku.edu      # Where to send mail	
#SBATCH --ntasks=1                    # Run on a single CPU
#SBATCH --mem=1gb                     # Job memory request
#SBATCH --time=0-00:05:00             # Time limit days-hrs:min:sec
#SBATCH --output=serial_test_%j.log   # Standard output and error log

pwd; hostname; date
 
module load python/3.6
 
echo "Running python script"
 
python /path/to/your/python/script/script.py
 
date

Threaded or multi-core job

This script can serve as a template for applications that are capable of using multiple processors on a single server or physical computer. These applications are commonly referred to as threaded, OpenMP, PTHREADS, or shared memory applications. While they can use multiple processors, they cannot make use of multiple servers and all the processors must be on the same node.

These applications required shared memory and can only run on one node; as such it is important to remember the following:

  • You must set --ntasks=1, and then set --cpus-per-task to the number of threads you wish to use.
  • You must make the application aware of how many processors to use. How that is done depends on the application:
    • For some applications, set OMP_NUM_THREADS to a value less than or equal to the number of --cpus-per-task you set.
    • For some applications, use a command line option when calling that application.
#!/bin/bash
#SBATCH --job-name=parallel_job      # Job name
#SBATCH --partition=sixhour          # Partition Name (Required)
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ku.edu     # Where to send mail	
#SBATCH --ntasks=1                   # Run a single task	
#SBATCH --cpus-per-task=4            # Number of CPU cores per task
#SBATCH --mem-per-cpu=2gb            # Job memory request
#SBATCH --time=0-00:05:00            # Time limit days-hrs:min:sec
#SBATCH --output=parallel_%j.log     # Standard output and error log

pwd; hostname; date
 
echo "Running on $SLURM_CPUS_PER_TASK cores"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
 
module load compiler/gcc/6.3
 
/path/to/your/program

MPI job

These are applications that can use multiple processors that may, or may not, be on multiple compute nodes. In SLURM, the --ntasks flag specifies the number of MPI tasks created for your job. Note that, even within the same job, multiple tasks do not necessarily run on a single node. Therefore, requesting the same number of CPUs as above, but with the --ntasks flag, could result in those CPUs being allocated on several, distinct compute nodes.

For many users, differentiating between --ntasks and --cpus-per-task is sufficient. However, for more control over how SLURM lays out your job, you can add the --nodes and --ntasks-per-node flags. --nodes specifies how many nodes to allocate to your job. SLURM will allocate your requested number of cores to a minimal number of nodes on the cluster, so it is extremely likely if you request a small number of tasks that they will all be allocated on the same node. However, to ensure they are on the same node, set --nodes=1 (obviously this is contingent on the number of CPUs and requesting too many may result in a job that will never run). Conversely, if you would like to ensure a specific layout, such as one task per node for memory, I/O or other reasons, you can also set --ntasks-per-node=1. Note that the following must be true:

ntasks-per-node * nodes >= ntasks

The job below requests 16 tasks per node, with 2 nodes. By default, each task gets 1 core, so this job uses 32 cores. If the --ntasks=16 option was used, it would only use 16 cores and could be on any of the nodes in the partition, even split between multiple nodes.

#!/bin/bash

#SBATCH --partition=sixhour      # Partition Name (Required)
#SBATCH --ntasks-per-node=16     # 16 tasks per node with each task given 1 core
#SBATCH --nodes=2                # Run across 2 nodes
#SBATCH --constraint=ib          # Only nodes with Infiniband (ib)
#SBATCH --mem-per-cpu=4gb        # Job memory request
#SBATCH --time=0-06:00:00        # Time limit days-hrs:min:sec
#SBATCH --output=mpi_%j.log      # Standard output and error log
 
echo "Running on $SLURM_JOB_NODELIST nodes using $SLURM_CPUS_ON_NODE cores on each node"
 
mpirun /path/to/program

GPU or MIC jobs

GPU and MIC (Intel Xeon Phi) nodes can be requested using the general consumable resource option (--gres=gpu/mic). There are 3 different types of GPU cards in the KU Community Cluster set up as constraints. To run on a V100 GPU:

--gres=gpu --constraint=v100
Multiple GPUs
You may request multiple GPUs by changing the --gres value to --gres=gpu:2. Note that this value is per node.
For example, --nodes=2 --gres=gpu:2 will request 2 nodes with 2 GPUs each, for a total of 4 GPUs.

The job below request a single GPU node in the sixhour partition

#!/bin/bash
#SBATCH --partition=sixhour   # Partition Name (Required)
#SBATCH --ntasks=1            # 1 task
#SBATCH --time=0-06:00:00     # Time limit days-hrs:min:sec
#SBATCH --gres=gpu            # 1 GPU
#SBATCH --output=gpu_%j.log   # Standard output and error log
 
module load singularity
CONTAINERS=/panfs/pfs.local/software/install/singularity/containers
singularity exec --nv $CONTAINERS/tensorflow-gpu-1.9.0.img python ./models/tutorials/image/mnist/convolutional.py

Common Commands

Submitting the Job

All Commands
List of all commands.

Submitting the SLURM job is done by command sbatch. SLURM will read the submit file, and schedule the job according to the description in the submit file.

Submitting the job described above is:

$ sbatch example.sh 
Submitted batch job 62

Checking Job Status

To check the status of your job, use the squeue command. It will provide information such as:

  • The State (ST) of the job:
    • R - Running
    • PD - Pending - Job is awaiting resource allocation.
    • Additional codes are available on the squeue page.
  • Job Name
  • Run Time
  • Nodes running the job

Checking the status of jobs owned by a specific username, use the -u option

$ squeue -u <username>
  JOBID PARTITION     NAME       USER  ST       TIME  NODES NODELIST(REASON)
     65   sixhour hello-wo <username>   R       0:56      1 g004

Additionally, if you want to see the status of a specific partition, for example if you are part of a partition, you can use the -p option to squeue:

$ squeue -p sixhour
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
  73435  sixhour  MyRandom  jayhawk   R   10:35:20      1 r10r29n1
  73436  sixhour  MyRandom  jayhawk   R   10:35:20      1 r10r29n1
  73735  sixhour  SW2_driv   bigjay   R   10:14:11      1 r31r29n1
  73736  sixhour  SW2_driv   bigjay   R   10:14:11      1 r31r29n1

Checking Job Start

You may view the start time of your job with the command squeue --start. The output of the command will show the expected start time of the jobs.

$ squeue --start --user jayhawk
  JOBID  PARTITION     NAME     USER  ST           START_TIME  NODES NODELIST(REASON)
   5822    sixhour  Jobname   bigjay  PD  2018-08-24T00:05:09      3 (Priority)
   5823    sixhour  Jobname   bigjay  PD  2018-08-24T00:07:39      3 (Priority)
   5824    sixhour  Jobname   bigjay  PD  2018-08-24T00:09:09      3 (Priority)
   5825    sixhour  Jobname   bigjay  PD  2018-08-24T00:12:09      3 (Priority)
   5826    sixhour  Jobname   bigjay  PD  2018-08-24T00:12:39      3 (Priority)
   5827    sixhour  Jobname   bigjay  PD  2018-08-24T00:12:39      3 (Priority)
   5828    sixhour  Jobname   bigjay  PD  2018-08-24T00:12:39      3 (Priority)
   5829    sixhour  Jobname   bigjay  PD  2018-08-24T00:13:09      3 (Priority)
   5830    sixhour  Jobname   bigjay  PD  2018-08-24T00:13:09      3 (Priority)
   5831    sixhour  Jobname   bigjay  PD  2018-08-24T00:14:09      3 (Priority)
   5832    sixhour  Jobname   bigjay  PD                  N/A      3 (Priority)

The output shows the expected start time of the jobs, as well as the reason that the jobs are currently idle (in this case, low priority of the user due to running numerous jobs already).

Removing the Job

Removing the job is done with the scancel command. The only argument to the scancel command is the job id. The command is:

$ scancel 2234

Job History

sacct can be used to display currently running jobs and their usage and also previous job usage. It can be customized to look at certain options

$ sacct -u <user>

170          parallel_+    sixhour        crc          4  COMPLETED      0:0 
170.batch         batch                   crc          4  COMPLETED      0:0 
171          parallel_+    sixhour        crc          4 CANCELLED+      0:0 
171.batch         batch                   crc          4  CANCELLED     0:15 

Show all job information starting form a specific date

$ sacct --starttime 2014-07-01

Show job account information for a specific job

$ sacct -j <jobid>
$ sacct -j <jobid> -l 

Node Features

Features are requested under the --constraints option. Because the cluster is consortium of hardware, attributes allow the user to specify which type of node they wish to use (e.g. ib, edr_ib, intel)

#SBATCH --constraint "intel"
#SBATCH --constraint "intel&ib"
Feature Description
intel Intel CPUs
amd AMD CPUs
ib At least FDR Infiniband connections
edr_ib EDR Infiniband connections
noib Without Infiniband connections
k40 NVIDIA K40 GPUs. Must request --gres option to be assigned GPU.
k80 NVIDIA K80 GPUs. Must request --gres option to be assigned GPU.
v100 NVIDIA V100 GPUs. Must request --gres option to be assigned GPU.

Partitions

Each owner group has their own partition. (e.g. bi, compbio, crmda). You can view partitions you can submit to by running mystats​.

  • 60-00:00:00 (60 days): Max walltime of owner partitions
Job Partition
You must specify --partition for your job.
There is no default partition.

Six Hour

Other than the owner group partitions, there is a sixhour partition. This partition will allow your jobs to go across all IDLE nodes in the cluster, but is limited to a wall time of 6 hours.

To run in the sixhour partition, specify #SBATCH --partition sixhour in your job script.

SLURM Options

All options below are prefixed with #SBATCH. For example:

#SBATCH --partition=sixhour
#SBATCH --job-name=Jobname

This is a brief list of the most commonly used SLURM options. All options can be on the SLURM Documentaiton.

Option Abbreviation
Almost all options have a single letter abbreviation.
Option Function
-a, --array=<indexes> Submit a job array, multiple jobs to be executed with identical parameters.
-c, --cpus-per-task=<ncpus> Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.
-C, --constraint=<list> Request which features the job requires.
-d, --dependency=<dependency_list> Defer the start of this job until the specified dependencies have been satisfied completed.
-D, --chdir=<directory> Set the working directory of the batch script to directory before it is executed.
-e, --error=<filename pattern> Instruct Slurm to connect the batch script's standard error directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to the same file.
--export=<environment variables [ALL] | NONE> Identify which environment variables are propagated to the launched application, by default all are propagated. Multiple environment variable names should be comma separated.
--gres=<list> Specifies a comma delimited list of generic consumable resources. The format of each entry on the list is "name[:count]". Example: "--gres=gpu:2"
-J, --job-name=<jobname> Specify a name for the job allocation.
--mail-type=<type> Notify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL
--mail-user=<user> User to receive email notification of state changes as defined by --mail-type.
--mem=<size[units]> Specify the real memory required per node. Default units are megabytes. See Memory Limits
--mem-per-cpu=<size[units]> Minimum memory required per allocated CPU. Default units are megabytes. See Memory Limits
-n, --ntasks=<number> sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the --cpus-per-task option will change this default.
--ntasks-per-node=<ntasks> Request that ntasks be invoked on each node.
-N, --nodes=<minnodes[-maxnodes]> Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.
-o, --output=<filename pattern> Instruct Slurm to connect the batch script's standard output directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to the same file.
-p, --partition=<partition_names> Request a specific partition for the resource allocation. If the job can use more than one partition, specify their names in a comma separate list. Required.
-t, --time=<time> Set a limit on the total run time of the job allocation. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

Default Options

If some options are not specified in the submission, default values will be set

  • Defaults:

    • --nodes=1
    • --cpus-per-task=1
    • --mem-per-cpu=2gb
    • --time=8:00:00 (8 hours) for owner queues and the max is 60:00:00:00 (60 days).
    • --time=1:00:00 (1 hour) for the sixhour queue.

Memory Limits

We reserve a chunk of memory per node for system services to prevent the node from crashing. This varies with the amount of memory reported by the server.

This also goes for --mem-per-core. You'll have to take the number of cores requested per node and multiply that by your --mem-per-core value and make sure it does not go above the allowed limit.

Total amount of memory on node Amount allowed to request
32 GB 30 GB
64 GB 61 GB
128 GB 125 GB
192 GB 186 GB
256 GB 250 GB
384 GB 376 GB
512 GB 503 GB
768 GB 754 GB

SLURM Commands

Below are some common, useful SLURM commands:

SLURM Command Function
sacct Used to report job or job step accounting information about active or completed jobs.
sinfo Reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.
srun Used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
squeue Reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
squeue -u <username> Display the jobs submitted by the specified <username>
squeue -p <partition> Display the jobs in the specified <partition>. (Will not show jobs running in the sixhour partition that may be running on an owner partition)
scontrol show job <jobid> Check the status of a job (<jobid>).
squeue --start --job <jobid> Show an estimate of when your job (<jobid>) might start.
scontrol show nodes <node_name> Check the status of a node (<node_name>).
scancel <jobid> Cancel a job.

Cluster Support

If you need any help with the cluster or have general questions related to the cluster, please contact crchelp@ku.edu.

In your email, please include your submission script, any relevant log files, and steps in which you took to produce the problem

One of 34 U.S. public institutions in the prestigious Association of American Universities
44 nationally ranked graduate programs.
—U.S. News & World Report
Top 50 nationwide for size of library collection.
—ALA
23rd nationwide for service to veterans —"Best for Vets," Military Times
KU Today