slurm:ai
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
slurm:ai [2021/01/06 16:10] – kauffman | slurm:ai [2021/06/25 14:51] – grzenda | ||
---|---|---|---|
Line 5: | Line 5: | ||
Feedback is requested: | Feedback is requested: | ||
- | | + | |
- | Knowledge of how to use Slurm already is preferred at this stage of testing. | ||
- | The information from the older cluster mostly applies and I suggest you read that documentation: | + | The information from the older cluster mostly applies and I suggest you read that documentation: |
====== Infrastructure ====== | ====== Infrastructure ====== | ||
- | Summary of nodes installed on the cluster | + | Summary of nodes installed on the cluster. |
+ | |||
+ | * [[ http:// | ||
+ | * [[ https:// | ||
+ | * Use '' | ||
===== Computer/ | ===== Computer/ | ||
Line 26: | Line 29: | ||
* 384G RAM | * 384G RAM | ||
* 4x Nvidia Quadro RTX 8000 | * 4x Nvidia Quadro RTX 8000 | ||
+ | |||
+ | * 3x nodes | ||
+ | * 2x AMD EPYC 7302 16-Core Processor | ||
+ | * 512G RAM | ||
+ | * 4x Nvidia A40 | ||
+ | * Note that not all nodes are online yet. | ||
* all: | * all: | ||
Line 39: | Line 48: | ||
* / | * / | ||
* We intend to set user quotas, however, there are no quotas right now. | * We intend to set user quotas, however, there are no quotas right now. | ||
- | * / | + | * / |
* Lives on the home directory server. | * Lives on the home directory server. | ||
* Idea would be to create a dataset with a quota for people to use. | * Idea would be to create a dataset with a quota for people to use. | ||
Line 53: | Line 62: | ||
* zfs mirror with previous snapshots of ' | * zfs mirror with previous snapshots of ' | ||
* NOT a backup. | * NOT a backup. | ||
- | * Not enabled yet. | + | |
Line 123: | Line 132: | ||
</ | </ | ||
- | ==== Notes on CUDA_VISIBLE_DEVICES ==== | ||
- | CUDA_VISIBLE_DEVICES: | ||
- | * This variable should NOT be modified. Ever. | ||
- | * Relative means that if you requested one gpu it will show up as 0. Even if all other gpus on the server are being used by others. | ||
- | |||
- | ===== Fairshare/ | ||
- | By default all usage is tracked and charged to a users default account. A fairshare value is computed and used in prioritizing a job on submission. | ||
- | |||
- | Details are being worked out for anyone that donates to the cluster. This will be some sort of tiered system where you get to use a higher priority when you need it. | ||
- | You will need to charge an account on job submission '' | ||
Line 177: | Line 176: | ||
==== Interactive ==== | ==== Interactive ==== | ||
- '' | - '' | ||
- | - '' | + | - '' |
- '' | - '' | ||
- '' | - '' | ||
- '' | - '' | ||
- | - '' | + | - '' |
- Make a new ssh connection with a tunnel to access your notebook | - Make a new ssh connection with a tunnel to access your notebook | ||
- '' | - '' | ||
- This will make an ssh tunnel on your local machine that fowards traffic sent to '' | - This will make an ssh tunnel on your local machine that fowards traffic sent to '' | ||
- Open your local browser and visit: '' | - Open your local browser and visit: '' | ||
+ | |||
+ | |||
+ | ====== Contribution Policy ===== | ||
+ | This section can be ignored by most people. [[techstaff: | ||
/var/lib/dokuwiki/data/pages/slurm/ai.txt · Last modified: 2022/04/04 10:58 by chaochunh