techstaff:aicluster-admin
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
techstaff:aicluster-admin [2021/02/10 13:27] – [Spreadsheet] kauffman | techstaff:aicluster-admin [2021/02/23 19:58] (current) – kauffman | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== AI Cluster Policy Description ====== | ====== AI Cluster Policy Description ====== | ||
- | |||
- | After various iterations we (Bob, Har, and I) believe to have an | ||
- | implementation of the policy that meets the requirements discussed | ||
- | previously. | ||
- | |||
===== TODO ===== | ===== TODO ===== | ||
- There are multiple methods used to calculate priority reflected on the spreadsheet. | - There are multiple methods used to calculate priority reflected on the spreadsheet. | ||
- Double check the math for correctness. | - Double check the math for correctness. | ||
- Does the math reflect our intent? I think it does. | - Does the math reflect our intent? I think it does. | ||
- | - Multiple methods in calculating priority are reflected (blue to purple cells: har_priority, | + | - Multiple methods in calculating priority are reflected (blue to purple cells: |
- Choose one calculation to use. This decision doesn' | - Choose one calculation to use. This decision doesn' | ||
- If you have a suggestion: Please show us the work in a new column or by cloning the sheet. | - If you have a suggestion: Please show us the work in a new column or by cloning the sheet. | ||
- | ===== Tracking ===== | + | ===== Contribution |
[[https:// | [[https:// | ||
Line 20: | Line 15: | ||
==== Sheet usage ==== | ==== Sheet usage ==== | ||
- | | + | |
- | | + | |
- | | + | |
- | - The group ' | + | * contributions |
- | - Red: Do not edit | + | |
- | - Green: user input (This will be Techstaff 95% of the time) | + | |
- | - `groups` sheet: | + | |
* calculates contribution amount for use in `contrib-priority`. | * calculates contribution amount for use in `contrib-priority`. | ||
* tracks group name and primary owner | * tracks group name and primary owner | ||
- | | + | |
+ | * All contributions will get entered here. | ||
+ | * Hardware contribution gets converted to USD by techstaff. A receipt of the purchase is good starting point. | ||
+ | * The group ' | ||
+ | * `contrib-priority` calculation references contrib amounts calculated in `groups`. | ||
- | ===== Details | + | ===== Understanding Slurm Fairshare and Priority/ |
- | By default the cluster uses a [[https:// | + | Slurm comes with built in tools to calculation fair share priorities without anyone needing to do anything special. The cluster uses a [[https:// |
+ | ==== How Slurm calculates Job priority ==== | ||
Generally Slurm will use this formula to determine a jobs priority. | Generally Slurm will use this formula to determine a jobs priority. | ||
< | < | ||
Line 53: | Line 50: | ||
</ | </ | ||
- | The factors on the left that start with '' | + | The factors on the left that start with '' |
< | < | ||
Line 67: | Line 64: | ||
PriorityFavorSmall=YES | PriorityFavorSmall=YES | ||
</ | </ | ||
- | **Note that this example may not be up to date when you read this. | + | *Note that this example may not be up to date when you read this. |
- | We adjust those priorities with partitions for those who have donated either monetarily or with hardware. Hardware donations get converted a monetary value when logged on the spreadsheet. | + | ===== How we modify job priority |
- | + | ||
- | Every contribution gets assigned a POSIX group, hereon refered | + | |
+ | * We adjust those priorities with partitions for those who have donated either monetarily or with hardware. Hardware donations get converted a monetary value when logged on the spreadsheet. | ||
+ | * Every contribution gets assigned a POSIX group, hereon referred to as '' | ||
Line 78: | Line 75: | ||
< | < | ||
PartitionName=general Nodes=a[001-008] | PartitionName=general Nodes=a[001-008] | ||
- | PartitionName=cdac-own Nodes=a[005-008] AllowGroups=cdac Priority=100 | + | #PartitionName=cdac-own Nodes=a[005-008] AllowGroups=cdac Priority=100 |
PartitionName=cdac-contrib Nodes=a[001-008] AllowGroups=cdac Priority=5 | PartitionName=cdac-contrib Nodes=a[001-008] AllowGroups=cdac Priority=5 | ||
</ | </ | ||
- | ^Partition^ Description^ | + | ^Partition^Description^Priority^ |
- | |general| For all users| | + | |general| For all users| 0 | |
- | |${group}-own | Machines $group has donated | | + | |${group}-own | Machines $group has donated. Enabled when asked. | 100 | |
- | |${group}-contrib | A method to give slightly higher job priority to groups who have donated but do not own machines.| | + | |${group}-contrib | A method to give slightly higher job priority to groups who have donated but do not own machines.| Variable based on spreadsheet calculation. | |
+ | The key thing to notice before you continue reading is that nodes can be added to multiple partitions. '' | ||
- | The key thing to notice before you continue reading is that nodes can be | ||
- | added to multiple partitions. | ||
- | ' | + | ==== Calculating |
- | priorities. | + | |
- | Priority is normalized in the sheet to be 0-100. | + | We do the following calculation to determine the '' |
- | + | ||
- | Understanding the partition configuration: | + | |
- | + | ||
- | - All users get access to partition '' | + | |
- | - Group ' | + | |
- | - ' | + | |
- | + | ||
- | + | ||
- | We do the following calculation to determine the '' | + | |
< | < | ||
Line 113: | Line 99: | ||
The percent will end up as an integer. | The percent will end up as an integer. | ||
+ | There is a [[https:// | ||
- | You'll see on the spreadsheet we take subtract | + | |
+ | You'll see on the spreadsheet we take subtract | ||
Total amount of money contributed and " | Total amount of money contributed and " | ||
Line 121: | Line 109: | ||
- | Note that the term " | + | Note that the term " |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
/var/lib/dokuwiki/data/attic/techstaff/aicluster-admin.1612985264.txt.gz · Last modified: 2021/02/10 13:27 by kauffman