techstaff:slurm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
techstaff:slurm [2018/05/04 12:47] – [Partitions / Queues] kauffman | techstaff:slurm [2018/12/18 15:44] – [More] kauffman | ||
---|---|---|---|
Line 112: | Line 112: | ||
| ^ SLURM ^ Example ^ | | ^ SLURM ^ Example ^ | ||
^ Submit a batch serial job | sbatch | sbatch runscript.sh | | ^ Submit a batch serial job | sbatch | sbatch runscript.sh | | ||
- | ^ Run a script | + | ^ Run a script |
^ Kill a job | scancel | scancel 4585 | | ^ Kill a job | scancel | scancel 4585 | | ||
^ View status of queues | squeue | squeue -u cnetid | | ^ View status of queues | squeue | squeue -u cnetid | | ||
Line 320: | Line 320: | ||
+ | ==== Checking how many Generic RESources are being consumed ==== | ||
- | ===== Paths ===== | + | Simple use the '' |
- | You will need to add the following to your '' | + | < |
+ | $ squeue -O username, | ||
+ | USER NODELIST | ||
+ | someusername | ||
+ | otherusername | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | |||
+ | ===== Environment Variables | ||
+ | |||
+ | ==== CUDA_HOME, LD_LIBRARY_PATH ==== | ||
+ | |||
+ | Please make sure you specify $CUDA_HOME and if you want to take advantage of CUDNN libraries you will need to append / | ||
+ | |||
+ | cuda_version=9.2 | ||
+ | export CUDA_HOME=/ | ||
+ | export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: | ||
+ | |||
+ | Currently we support the same versions of CUDA that the latest version of CUDNN supports. This is not written in stone and we can accommodate most other versions if required; just let techstaff know what your needs are. | ||
+ | |||
+ | ==== PATH ==== | ||
+ | You may also need to add the following to your '' | ||
export PATH=$PATH:/ | export PATH=$PATH:/ | ||
- | export LD_LIBRARY_PATH=$LD_LIBRARY_PATH=/usr/local/ | + | |
+ | ==== CUDA_VISIBLE_DEVICES ==== | ||
+ | Do not set this variable. It will be set for you by SLURM. | ||
+ | |||
+ | The variable name is actually misleading; since it does NOT mean the amount of devices, but rather the physical device number assigned by the kernel (e.g. /dev/nvidia2). | ||
+ | |||
+ | For example: If you requested multiple gpu's from SLURM (--gres=gpu: | ||
Line 383: | Line 412: | ||
STDERR should be blank. | STDERR should be blank. | ||
====== More ====== | ====== More ====== | ||
- | If you feel this documentation is lacking in some way please let techstaff know. Email [[techstaff@cs.uchicago.edu]], | + | If you feel this documentation is lacking in some way please let techstaff know. Email [[techstaff@cs.uchicago.edu]], |
/var/lib/dokuwiki/data/pages/techstaff/slurm.txt · Last modified: 2021/01/06 16:13 by kauffman