techstaff:slurm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
techstaff:slurm [2018/02/22 17:21] – kauffman | techstaff:slurm [2018/05/04 12:32] – kauffman | ||
---|---|---|---|
Line 20: | Line 20: | ||
===== Mailing List ===== | ===== Mailing List ===== | ||
- | If you are going to be a user of this cluster please sign up for the mailing list. Downtime and other relevant | + | If you are going to be a user of this cluster please sign up for the mailing list. Downtime and other relevant |
[[ https:// | [[ https:// | ||
Line 64: | Line 64: | ||
Request interactive shell | Request interactive shell | ||
< | < | ||
+ | |||
+ | Create a directory on the scratch partition if you don't already have one: | ||
+ | < | ||
Change into my scratch directory: | Change into my scratch directory: | ||
- | < | + | < |
Get the files I need: | Get the files I need: | ||
< | < | ||
- | user@research2:/ | + | user@slurm1:/ |
foo | foo | ||
</ | </ | ||
Check that the file now exists: | Check that the file now exists: | ||
< | < | ||
- | user@research2:/ | + | user@slurm1:/ |
-rw------- 1 user user 105121 Dec 29 2015 foo | -rw------- 1 user user 105121 Dec 29 2015 foo | ||
</ | </ | ||
Line 242: | Line 245: | ||
====== Using the GPU ====== | ====== Using the GPU ====== | ||
+ | |||
+ | ===== GRES Multiple GPU's on one system ===== | ||
+ | GRES: Generic Resource. As of 2018-05-04 these only include GPU's. | ||
+ | |||
+ | Jobs will not be allocated any generic resources unless specifically requested at job submit time using the '' | ||
+ | < | ||
+ | |||
+ | Jobs will be allocated specific generic resources as needed to satisfy the request. If the job is suspended, those resources do not become available for use by other jobs. | ||
+ | |||
+ | Job steps can be allocated generic resources from those allocated to the job using the '' | ||
+ | |||
+ | ==== Ok, but I don't want to read the wall of text above ==== | ||
+ | Fine. | ||
+ | |||
+ | The '' | ||
+ | |||
+ | < | ||
+ | --gpu=gpu: | ||
+ | # Please try to limit yourself to one GPU per person. | ||
+ | </ | ||
+ | |||
+ | Example when using tensorflow: | ||
+ | |||
+ | Give the file ' | ||
+ | Depends on: | ||
+ | '' | ||
+ | '' | ||
+ | < | ||
+ | # | ||
+ | from tensorflow.python.client import device_lib | ||
+ | print(device_lib.list_local_devices()) | ||
+ | </ | ||
+ | |||
+ | Here we can see that no GPU was allocated to us because we did not specify the '' | ||
+ | < | ||
+ | kauffman3@bulldozer: | ||
+ | kauffman3@gpu3: | ||
+ | kauffman3@gpu3: | ||
+ | </ | ||
+ | |||
+ | If we request only 1 GPU. | ||
+ | < | ||
+ | kauffman3@bulldozer: | ||
+ | kauffman3@gpu3: | ||
+ | physical_device_desc: | ||
+ | </ | ||
+ | |||
+ | If we request 2 GPUs. | ||
+ | < | ||
+ | kauffman3@bulldozer: | ||
+ | kauffman3@gpu3: | ||
+ | physical_device_desc: | ||
+ | physical_device_desc: | ||
+ | </ | ||
+ | |||
+ | If we request more GPUs then are available. | ||
+ | < | ||
+ | kauffman3@bulldozer: | ||
+ | srun: error: Unable to allocate resources: Requested node configuration is not available | ||
+ | </ | ||
+ | |||
+ | ==== Cool, but how do I know where and what resources are available ==== | ||
+ | Turns out the '' | ||
+ | < | ||
+ | $ sinfo -O partition, | ||
+ | PARTITION | ||
+ | debug* | ||
+ | general | ||
+ | pascal | ||
+ | titan | ||
+ | </ | ||
+ | |||
+ | FEATURES: Is actually just an arbitrary string in the configuration file that defines a node. However, techstaff hopes it actually provides some useful info. | ||
+ | |||
+ | GRES: Don't depend on this being accurate, however it will definitely give you a clue as to how many generic resources are in a partition. | ||
+ | |||
+ | |||
+ | |||
===== Paths ===== | ===== Paths ===== | ||
You will need to add the following to your $PATH and $LD_LIBRARY_PATH. | You will need to add the following to your $PATH and $LD_LIBRARY_PATH. |
/var/lib/dokuwiki/data/pages/techstaff/slurm.txt · Last modified: 2021/01/06 16:13 by kauffman