The KU Community Cluster uses SLURM (Simple Linux Utility for Resource Management) for managing job scheudling.
The cluster uses your KU Online ID and password.
- SSH: Use a SSH2 client to connect to
usernamewith your KU Online ID, and then authenticate with your KU Online ID password. Alternatively, you can set up public-key authentication. SSH connections to
hpc.crc.ku.eduresolve to either of the following login nodes:
- X2Go: X2Go is software which allows you to access the cluster using a graphical desktop window. This allows you to open GUI applications such as MATLAB on the cluster.
If you are connecting from any of the University of Kansas' campuses, you may connect as the instructions above show.
If you wish to connect the KU Community Cluster from off campus, you must connect through KU Anywhere. If you have multiple VPN Entitlements, any one of them will work. After successful connection, you may connect as instructed above.
Maximum number of jobs The maximum number of jobs a user can have submitted at one time is 5000
Batch jobs:To run a job in batch mode, use your favorite text editor to create a file which has SLURM options and also instructions on how to run your job, called a submission script. All SLURM options are prefaced with
#SBATCH. It is necessary to specify the partition you wish to run in. After your script is complete, you can submit the job to the cluster with command
A submission script is simply a text file that contains your job parameters and the commands you wish to execute as part of your job. You can also load modules, set environmental variables, or other tasks inside your submission script.
You may also submit simple jobs from the command line
srun --partition=sixhour echo Hello World!
SLURM options in your job script.Command-line options Command-line options will override
Interactive jobs: An interactive job allows you to open a shell on the compute node as if you had ssh'd into it. It is usually used for debugging purposes.
To submit an interactive job, use the
srun. Again, you must specify which
--partitionyou wish your job to run in.
srun --time=4:00:00 --ntasks=1 --nodes=1 --partition=sixhour --pty /bin/bash -l
In the example above, the job has requested:
--time=4:00:004 hours for the job run
--ntasks=11 task. By default, 1 core is given to each task.
--partition=sixhourJob to run in sixhour partition
--pty /bin/bashInteractive terminal running /bin/bash shell.
--time, --ntasks, --nodesare called options.
If you have ssh'd to the submit nodes with X11 forwarding enabled and wish to have X11 for an interactive job, then supply the
srun --time=4:00:00 --ntasks=4 --nodes=1 --partition=sixhour --x11 --pty /bin/bash -l
To run a job in batch mode on a high-performance computing system using SLURM, first prepare a job script that specifies the application you want to run and the resources required to run it, and then submit the script to SLURM using the
A very basic job script might contain just a
tcsh shell script. However, SLURM job scripts most commonly contain at least one executable command preceded by a list of options that specify resources and other attributes needed to execute the command (e.g., wall-clock time, the number of nodes and processors, and filenames for job output and errors). These options prefaced with the #SBATCH instruction, which should precede any executable lines in your job script.
Additionally, your SLURM job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.
default options are applied.Default Options If no SLURM options are given,
Slurm is very explicit in how one requests cores and nodes. While extremely powerful, the three flags,
--cpus-per-task can be a bit confusing at first.
The term task in this context can be thought of as a process. Therefore, a multi-process program (e.g. MPI) is comprised of multiple tasks. And a multi-threaded program is comprised of a single task, which can in turn use multiple CPUs. In SLURM, tasks are requested with the
--ntasks flag. CPUs, for the multithreaded programs, are requested with the
--mem option can be used to request the appropriate amount of memory for your job. Please make sure to test your application and set this value to a reasonable number based on actual memory use. The
%j in the
--output line tells SLURM to substitute the job ID in the name of the output file. You can also add
--error with an error file name to separate output and error logs.
#!/bin/bash #SBATCH --job-name=serial_job_test # Job name #SBATCH --partition=sixhour # Partition Name (Required) #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --email@example.com # Where to send mail #SBATCH --ntasks=1 # Run on a single CPU #SBATCH --mem=1gb # Job memory request #SBATCH --time=0-00:05:00 # Time limit days-hrs:min:sec #SBATCH --output=serial_test_%j.log # Standard output and error log pwd; hostname; date module load python/3.6 echo "Running python script" python /path/to/your/python/script/script.py date
This script can serve as a template for applications that are capable of using multiple processors on a single server or physical computer. These applications are commonly referred to as threaded, OpenMP, PTHREADS, or shared memory applications. While they can use multiple processors, they cannot make use of multiple servers and all the processors must be on the same node.
These applications required shared memory and can only run on one node; as such it is important to remember the following:
- You must set
--ntasks=1, and then set
--cpus-per-taskto the number of threads you wish to use.
- You must make the application aware of how many processors to use. How that is done depends on the application:
- For some applications, set OMP_NUM_THREADS to a value less than or equal to the number of
- For some applications, use a command line option when calling that application.
- For some applications, set OMP_NUM_THREADS to a value less than or equal to the number of
#!/bin/bash #SBATCH --job-name=parallel_job # Job name #SBATCH --partition=sixhour # Partition Name (Required) #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --firstname.lastname@example.org # Where to send mail #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --mem-per-cpu=2gb # Job memory request #SBATCH --time=0-00:05:00 # Time limit days-hrs:min:sec #SBATCH --output=parallel_%j.log # Standard output and error log pwd; hostname; date echo "Running on $SLURM_CPUS_PER_TASK cores" export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK module load StdEnv /path/to/your/program
These are applications that can use multiple processors that may, or may not, be on multiple compute nodes. In SLURM, the
--ntasks flag specifies the number of MPI tasks created for your job. Note that, even within the same job, multiple tasks do not necessarily run on a single node. Therefore, requesting the same number of CPUs as above, but with the
--ntasks flag, could result in those CPUs being allocated on several, distinct compute nodes.
For many users, differentiating between
--cpus-per-task is sufficient. However, for more control over how SLURM lays out your job, you can add the
--nodes specifies how many nodes to allocate to your job. SLURM will allocate your requested number of cores to a minimal number of nodes on the cluster, so it is extremely likely if you request a small number of tasks that they will all be allocated on the same node. However, to ensure they are on the same node, set
--nodes=1 (obviously this is contingent on the number of CPUs and requesting too many may result in a job that will never run). Conversely, if you would like to ensure a specific layout, such as one task per node for memory, I/O or other reasons, you can also set
--ntasks-per-node=1. Note that the following must be true:
ntasks-per-node * nodes >= ntasks
The job below requests 16 tasks per node, with 2 nodes. By default, each task gets 1 core, so this job uses 32 cores. If the
--ntasks=16 option was used, it would only use 16 cores and could be on any of the nodes in the partition, even split between multiple nodes.
#!/bin/bash #SBATCH --partition=sixhour # Partition Name (Required) #SBATCH --ntasks-per-node=16 # 16 tasks per node with each task given 1 core #SBATCH --nodes=2 # Run across 2 nodes #SBATCH --constraint=ib # Only nodes with Infiniband (ib) #SBATCH --mem-per-cpu=4gb # Job memory request #SBATCH --time=0-06:00:00 # Time limit days-hrs:min:sec #SBATCH --output=mpi_%j.log # Standard output and error log echo "Running on $SLURM_JOB_NODELIST nodes using $SLURM_CPUS_ON_NODE cores on each node" mpirun /path/to/program
GPU and MIC (Intel Xeon Phi) nodes can be requested using the general consumable resource option (
--gres=gpu/mic). There are 5 different types of GPU cards in the KU Community Cluster set up as features. To run on a V100 GPU:
Multiple GPUs You may request multiple GPUs by changing the
--gres=gpu:2. Note that this value is per node. For example,
--gres=gpu:2will request 2 nodes with 2 GPUs each, for a total of 4 GPUs.
Single/Double Precision By default, your job will run on all GPUs in the cluster if using the sixhour partition. This includes GPUs that are only single precision capable. If you need double precision GPUs only, use
The job below request a single GPU node in the sixhour partition
#!/bin/bash #SBATCH --partition=sixhour # Partition Name (Required) #SBATCH --ntasks=1 # 1 task #SBATCH --time=0-06:00:00 # Time limit days-hrs:min:sec #SBATCH --gres=gpu # 1 GPU #SBATCH --output=gpu_%j.log # Standard output and error log module load singularity CONTAINERS=/panfs/pfs.local/software/install/singularity/containers singularity exec --nv $CONTAINERS/tensorflow-gpu-1.9.0.img python ./models/tutorials/image/mnist/convolutional.py
Submitting the Job
Submitting the SLURM job is done by command
sbatch. SLURM will read the submit file, and schedule the job according to the description in the submit file.
Submitting the job described above is:
$ sbatch example.sh Submitted batch job 62
Checking Job Status
To check the status of your job, use the
squeue command. It will provide information such as:
- The State (ST) of the job:
- R - Running
- PD - Pending - Job is awaiting resource allocation.
- Additional codes are available on the squeue page.
- Job Name
- Run Time
- Nodes running the job
Checking the status of jobs owned by a specific username, use the
$ squeue -u <username> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 65 sixhour hello-wo <username> R 0:56 1 g004
Additionally, if you want to see the status of a specific partition, for example if you are part of a partition, you can use the
-p option to
$ squeue -p sixhour JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 73435 sixhour MyRandom jayhawk R 10:35:20 1 r10r29n1 73436 sixhour MyRandom jayhawk R 10:35:20 1 r10r29n1 73735 sixhour SW2_driv bigjay R 10:14:11 1 r31r29n1 73736 sixhour SW2_driv bigjay R 10:14:11 1 r31r29n1
Checking Job Start
You may view the start time of your job with the command
squeue --start. The output of the command will show the expected start time of the jobs.
$ squeue --start --user jayhawk JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON) 5822 sixhour Jobname bigjay PD 2018-08-24T00:05:09 3 (Priority) 5823 sixhour Jobname bigjay PD 2018-08-24T00:07:39 3 (Priority) 5824 sixhour Jobname bigjay PD 2018-08-24T00:09:09 3 (Priority) 5825 sixhour Jobname bigjay PD 2018-08-24T00:12:09 3 (Priority) 5826 sixhour Jobname bigjay PD 2018-08-24T00:12:39 3 (Priority) 5827 sixhour Jobname bigjay PD 2018-08-24T00:12:39 3 (Priority) 5828 sixhour Jobname bigjay PD 2018-08-24T00:12:39 3 (Priority) 5829 sixhour Jobname bigjay PD 2018-08-24T00:13:09 3 (Priority) 5830 sixhour Jobname bigjay PD 2018-08-24T00:13:09 3 (Priority) 5831 sixhour Jobname bigjay PD 2018-08-24T00:14:09 3 (Priority) 5832 sixhour Jobname bigjay PD N/A 3 (Priority)
The output shows the expected start time of the jobs, as well as the reason that the jobs are currently idle (in this case, low priority of the user due to running numerous jobs already).
Removing the Job
Removing the job is done with the
scancel command. The only argument to the
scancel command is the job id. The command is:
$ scancel 2234
sacct can be used to display currently running jobs and their usage and also previous job usage. It can be customized to look at certain options
$ sacct -u <user> 170 parallel_+ sixhour crc 4 COMPLETED 0:0 170.batch batch crc 4 COMPLETED 0:0 171 parallel_+ sixhour crc 4 CANCELLED+ 0:0 171.batch batch crc 4 CANCELLED 0:15
Show all job information starting form a specific date
$ sacct --starttime 2014-07-01
Show job account information for a specific job
$ sacct -j <jobid> $ sacct -j <jobid> -l
Features are requested under the
--constraints option. Because the cluster is consortium of hardware, attributes allow the user to specify which type of node they wish to use (e.g. ib, edr_ib, intel)
#SBATCH --constraint "intel" #SBATCH --constraint "intel&ib"
||Code compiled on these nodes will run on all nodes in the cluster|
||At least FDR Infiniband connections|
||EDR Infiniband connections|
||Without Infiniband connections|
||AVX2 instruction set|
||AVX512 instruction set|
||NVIDIA K40 GPUs. Must request
||NVIDIA K80 GPUs. Must request
||NVIDIA V100 GPUs. Must request
||NVIDIA Quadro RTX 6000 GPUs. Must request
||NVIDIA Quadro RTX 8000 GPUs. Must request
||Double precision GPUs. Must request
Each owner group has their own partition. (e.g. bi, compbio, crmda). You can view partitions you can submit to by running
- 60-00:00:00 (60 days): Max walltime of owner partitions
Job Partition You must specify
--partitionfor your job. There is no default partition.
Other than the owner group partitions, there is a
sixhour partition. This partition will allow your jobs to go across all IDLE nodes in the cluster, but is limited to a wall time of 6 hours.
To run in the sixhour partition, specify
#SBATCH --partition sixhour in your job script.
All options below are prefixed with
#SBATCH. For example:
#SBATCH --partition=sixhour #SBATCH --job-name=Jobname
This is a brief list of the most commonly used SLURM options. All options can be on the SLURM Documentaiton.
Option Abbreviation Almost all options have a single letter abbreviation.
||Submit a job array, multiple jobs to be executed with identical parameters.|
||Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.|
||Request which features the job requires.|
||Defer the start of this job until the specified dependencies have been satisfied completed.|
||Set the working directory of the batch script to directory before it is executed.|
||Instruct Slurm to connect the batch script's standard error directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to the same file.|
||Identify which environment variables are propagated to the launched application, by default all are propagated. Multiple environment variable names should be comma separated.|
||Specifies a comma delimited list of generic consumable resources. The format of each entry on the list is "name[:count]". Example: "--gres=gpu:2"|
||Specify a name for the job allocation.|
||Notify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL|
||User to receive email notification of state changes as defined by --mail-type.|
||Specify the real memory required per node. Default units are megabytes. See Memory Limits|
||Minimum memory required per allocated CPU. Default units are megabytes. See Memory Limits|
||sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the --cpus-per-task option will change this default.|
||Request that ntasks be invoked on each node.|
||Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.|
||Instruct Slurm to connect the batch script's standard output directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to the same file.|
||Request a specific partition for the resource allocation. If the job can use more than one partition, specify their names in a comma separate list. Required.|
||Set a limit on the total run time of the job allocation. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".|
If some options are not specified in the submission, default values will be set
--time=8:00:00(8 hours) for owner queues and the max is
--time=1:00:00(1 hour) for the
We reserve a chunk of memory per node for system services to prevent the node from crashing. This varies with the amount of memory reported by the server.
This also goes for
--mem-per-core. You'll have to take the number of cores requested per node and multiply that by your
--mem-per-core value and make sure it does not go above the allowed limit.
|Total amount of memory on node||Amount allowed to request|
|32 GB||30 GB|
|64 GB||61 GB|
|128 GB||125 GB|
|192 GB||186 GB|
|256 GB||250 GB|
|384 GB||376 GB|
|512 GB||503 GB|
|768 GB||754 GB|
Below are some common, useful SLURM commands:
||Used to report job or job step accounting information about active or completed jobs.|
||Reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.|
||Used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.|
||Reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.|
||Display the jobs submitted by the specified
||Display the jobs in the specified
||Check the status of a job (
||Show an estimate of when your job (
||Check the status of a node (
||Cancel a job.|
Making sure your jobs use the right amount of RAM and the right number of CPUs helps you and others using the clusters use these resources more effeciently, and in turn get work done more quickly. Below are some examples of how to measure your CPU and RAM usage so you can make this happen. Be sure to check the example SLURM submission scripts to request the correct number of resources.
CPU Percentage Used By default, this is percentage of a single CPU. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, top will show a CPU use of 180%.
If you launch a program by putting
/usr/bin/time -v in front of it,
time will watch your program and provide statistics about the resources it used. Check
Percent of CPU this job got: for how much CPU was used. Check
Maximum resident set size (kbytes) for how much RAM the job used. For example:
/usr/bin/time -v stress -c 8 -t 10s stress: info:  dispatching hogs: 8 cpu, 0 io, 0 vm, 0 hdd stress: info:  successful run completed in 10s Command being timed: "stress -c 8 -t 10s" Percent of CPU this job got: 796% Maximum resident set size (kbytes): 2368
If your job is already running, you can check on its usage, but will have to wait until it has finished to find the maximum memory and CPU used. The easiest way to check the instantaneous memory and CPU usage of a job is to ssh to a compute node your job is running on. To find the node you should ssh to, run:
squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1654654 sixhour abc123 r557e636 R 0:24 1 n259
Then use ssh to connect to a node your job is running on from the NODELIST column:
SSH to compute node To access a compute node via ssh, you must have a job running on that compute node. Your ssh session will be bound by the same cpu, memory, and time your job requested.
Once you are on the compute node, run either
ps will give you instantaneous usage every time you run it. Here is some sample ps output:
ps -u $USER -o %cpu,rss,args %CPU RSS COMMAND 0.0 588 stress -c 5 -t 10000s 98.2 204 stress -c 5 -t 10000s 98.2 204 stress -c 5 -t 10000s 98.2 204 stress -c 5 -t 10000s 98.2 204 stress -c 5 -t 10000s 98.2 204 stress -c 5 -t 10000s
ps reports memory used in kilobytes, so each of the 5 stress processes is using 204KB of RAM. They are also using most of 5 cores, so future jobs like this should request 5 CPUs.
top runs interactively and shows you live usage statistics. You can press u, enter your KU Online ID, then enter to filter just your processes. For Memory usage, the number you are interested in is RES. In the case below, the igblastn and perl programs are each consuming from 46MB to 348MB of memory and each fully utilizing one CPU. You can press q to quit.
top - 23:29:16 up 112 days, 1:00, 1 user, load average: 5.17, 5.16, 5.15 Tasks: 647 total, 6 running, 641 sleeping, 0 stopped, 0 zombie Cpu(s): 25.5%us, 1.1%sy, 0.0%ni, 73.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 125.989G total, 122.164G used, 3917.367M free, 388.625M buffers Swap: 0.000k total, 0.000k used, 0.000k free, 118.752G cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16273 r557e636 20 0 96068 48m 5812 R 100.0 0.0 250:31.93 igblastn 16167 r557e636 20 0 316m 196m 1252 R 100.0 0.2 0:45.35 perl 16309 r557e636 20 0 468m 348m 1376 R 100.0 0.3 59:57.89 perl 16384 r557e636 20 0 94256 46m 5836 R 100.0 0.0 248:26.95 igblastn 16214 r557e636 20 0 194m 74m 1252 R 99.7 0.1 0:16.94 perl
Slurm records statistics for every job, including how much memory and CPU was used.
After the job completes, you can run
seff jobid to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to.
seff 1620511 Job ID: 1620511 Cluster: ku_community_cluster User/Group: r557e636/r557e636_g State: COMPLETED (exit code 0) Cores: 1 CPU Utilized: 8-19:03:16 CPU Efficiency: 99.87% of 8-19:19:34 core-walltime Job Wall-clock time: 8-19:19:34 Memory Utilized: 66.96 MB Memory Efficiency: 0.82% of 8.00 GB
The job above requested 1 core and 8GB of memory. It utilized the 1 core with 99.87% efficiently, but only used .82% of the 8GB of memory requested. Future jobs can probably be requested with less memory if the input data is the same.
seff If your job requests email to be sent for END or FAIL mail types, the seff information about that job will be sent in the body of the email.
You can also use the more flexible
sacct to get that info, along with other more advanced job queries.
sacct -j 1620511 -o "JobID%20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS,AllocTRES%32" JobID JobName User Partition NodeList Elapsed State ExitCode MaxRSS AllocTRES -------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- -------------------------------- 1620511 paper2tes+ r557e636 biostat n146 8-19:19:34 COMPLETED 0:0 billing=1,cpu=1,mem=8G,node=1 1620511.batch batch n146 8-19:19:34 COMPLETED 0:0 68572K cpu=1,mem=8G,node=1 1620511.extern extern n146 8-19:19:34 COMPLETED 0:0 616K billing=1,cpu=1,mem=8G,node=1
In order to ensure that all owner groups get their fair share of the cluster, we utilize SLURM's built-in job accounting and fairshare system. Every owner group is given quantity of shares based on the amount of SCU's they have purchased into the KU Community Cluster. The fairshare score of an owner group is then calculated based off of their share versus the amount of the cluster they have actually used. This fairshare score is then utilized to assign priority to their jobs relative to other users on the cluster. This keeps individual owner groups from monopolizing the resources in the sixhour partition, thus making it unfair to owner groups who have not used their fairshare for quite some time.
To see your fairshare score, run the command
Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- root 1.000000 2321244304 1.000000 0.500000 ku parent 1.000000 2321244304 1.000000 0.500000 crc 2 0.005556 103440 0.000045 0.994453 crc r557e636 2 0.003704 103365 0.000045 0.991695
An account is the owner group's name. The CRC owns 2 nodes in the cluster, and thus their RawShares is equal to 2. The NormShares value simply the Account's RawShares divided by the total number of RawShares given to all Accounts on the cluster. There are 359 total RawShares for all Accounts, and thus 2 / 359 = .005556.
RawUsage is the amount of CPU minutes the Account or User has used. The RawUsage is also effected by the halflife that is set for the cluster, which is currently 7 days. Thus work done in the last 7 days counts at full cost, work done 14 days ago costs half, work done 21 days ago one-fourth, and so on.
The next column is EffectvUsage. EffectvUsage is the Account's RawUsage divided by the total RawUsage for the cluster. Thus EffectvUsage is the percentage of the cluster the Account has actually used. In this case, the user has used 0.0045% of the cluster.
Finally, we have the Fairshare score. The Fairshare score is calculated using the following formula
f = 2^(-EffectvUsage/NormShares). From this one can see that there are five basic regimes for this score which are as follows:
- 1.0: Unused. The User has not run any jobs recently.
- 1.0 > f > 0.5: Underutilization. The User is underutilizing their granted Share. For example, when f=0.75 a lab has recently underutilized their Share of the resources 1:2
- 0.5: Average utilization. The User on average is using exactly as much as their granted Share.
- 0.5 > f > 0: Over-utilization. The User has overused their granted Share. For example, when f=0.25 a lab has recently over utilized their Share of the resources 2:1
- 0: No share left. The User has vastly overused their granted Share. If there is no contention for resources, the jobs will still start.
Since the usage of the cluster varies, the schedule does not stop Users from using more than their granted Share in their Account. Instead, the scheduler wants to fill idle cycles, so it will take whatever jobs it has available. Thus a User is essentially borrowing computing resource time in the future to use now. This will continue to drive down the Users's Fairshare score, but allow jobs for the User to still start. Eventually, another User with a higher Fairshare score will start submitting jobs and that labs jobs will have a higher priority because they have not used their granted Share.
Job Priority is an integer number that adjudicates the position of a job in the pending queue relative to other jobs. There are 3 components. Each component is multiplied by a weighting factor to have that component be more prominent in the scheduling of jobs.
- Partition: Jobs submitted to an owner group partition receive 20,000 priority versus 400 priority given to jobs in the sixhour partition. This ensures that any job submitted to an owner group partition will always be scheduled before a sixhour job, even if submitted after the sixhour job.
- Fairshare: The fairshare priority is given based on the usage of the cluster of the individual user.
- Age: All jobs once submitted start with a 0 priority for age. The age priority component increases as the job is in the PENDING state waiting for the available resources to become free.
You can view all PENDING jobs and their respective priorities using the