har_priority
, normalized_priority
, weighted_normalized_priority
AI Cluster Tracking and Priority calculation spreadsheet
AI Cluster committee members have access.
Slurm comes with built in tools to calculation fair share priorities without anyone needing to do anything special. The cluster uses a fair share algorithm with multiple factors to adjust job priorities.
Generally Slurm will use this formula to determine a jobs priority.
Job_priority = site_factor + (PriorityWeightAge) * (age_factor) + (PriorityWeightAssoc) * (assoc_factor) + (PriorityWeightFairshare) * (fair-share_factor) + (PriorityWeightJobSize) * (job_size_factor) + (PriorityWeightPartition) * (partition_factor) + (PriorityWeightQOS) * (QOS_factor) + SUM(TRES_weight_cpu * TRES_factor_cpu, TRES_weight_<type> * TRES_factor_<type>, ...) - nice_factor
The factors on the left that start with PriorityWeight*
can and are set in the Slurm cluster configuration file (found on all nodes at /etc/slurm-llnl/slurm.conf). The factors on the right are calculated based previous job submissions for that particular user.
fe01:~$ cat /etc/slurm-llnl/slurm.conf |grep "^Priority" PriorityType=priority/multifactor PriorityDecayHalfLife=08:00:00 PriorityMaxAge=5-0 PriorityWeightFairshare=500000 PriorityWeightPartition=100000 PriorityWeightQOS=0 PriorityWeightJobSize=0 PriorityWeightAge=0 PriorityFavorSmall=YES
*Note that this example may not be up to date when you read this.
$group
.Here is a version of the partition configuration as it stands now (2021-02-10).
PartitionName=general Nodes=a[001-008] #PartitionName=cdac-own Nodes=a[005-008] AllowGroups=cdac Priority=100 PartitionName=cdac-contrib Nodes=a[001-008] AllowGroups=cdac Priority=5
Partition | Description | Priority |
---|---|---|
general | For all users | 0 |
${group}-own | Machines $group has donated. Enabled when asked. | 100 |
${group}-contrib | A method to give slightly higher job priority to groups who have donated but do not own machines. | Variable based on spreadsheet calculation. |
The key thing to notice before you continue reading is that nodes can be added to multiple partitions. general
and cdac-contrib
can submit to all nodes but with different priorities.
We do the following calculation to determine the *-contrib
partitions usage over the past 30 days in comparison to total cluster usage.
partition usage total time in seconds for 30 days ------------------------------------------------------ = percent used all partition usage total time in seconds for 30 days
The percent will end up as an integer.
There is a python script that does this and sends techstaff a report. The repo is currently not available for everyone to see but I think that it should be eventually. In the mean time you can take a look at it on the front end nodes (/usr/local/slurm-tools/cluster_partition_usage.py).
You'll see on the spreadsheet we take subtract percent used
from 100, ensure it's positive and call that "idleness".
Total amount of money contributed and "idleness" are the key factors in determining the priority of a groups 'contrib' partition.
This calculation will be run once a month and the relevant groups ${group}-contrib priority updated to reflect past months usage.
Note that the term "idleness" should not be taken literally. I don't know of way to actually calculate true idleness. I believe that the current calculation reflects the intended usage.
Since I'm still working on it, I don't guarantee any uptime yet. Mainly I need to make sure TRES tracking is working like we want. This will involve restarting slurmd and slurmctld which will kill running jobs.