techstaff:slurm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
techstaff:slurm [2018/11/21 11:31] – [$CUDA_VISIBLE_DEVICES] kauffman | techstaff:slurm [2019/10/11 16:54] – kauffman | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Notice ===== | ===== Notice ===== | ||
- | **2017-08-31**: Configuration change to allow allocation on CPUs and RAM. Please read the 'Default Quota' | + | **2019-10-08**: New computer nodes added under the partiton '' |
====== Peanut Job Submission Cluster ====== | ====== Peanut Job Submission Cluster ====== | ||
- | We are currently **alpha** testing and gauging user interest in a cluster of machines that allows for the submission of long running compute jobs. Think of these machines as a dumping ground for discrete computing tasks that might be rude or disruptive to execute on the main (shared) shell servers (i.e., linux1, linux2, linux3). | + | Think of these machines as a dumping ground for discrete computing tasks that might be rude or disruptive to execute on the main (shared) shell servers (i.e., linux1, linux2, linux3). |
For job submission we will be using a piece of software called [[http:// | For job submission we will be using a piece of software called [[http:// | ||
Line 44: | Line 43: | ||
==== Hardware ==== | ==== Hardware ==== | ||
Our cluster contains nodes with the following specs: | Our cluster contains nodes with the following specs: | ||
+ | |||
+ | '' | ||
* 16 Cores (2x 8core 3.1GHz Processors), | * 16 Cores (2x 8core 3.1GHz Processors), | ||
* 64gb RAM | * 64gb RAM | ||
* 2x 500GB SATA 7200RPM in RAID1 | * 2x 500GB SATA 7200RPM in RAID1 | ||
+ | |||
+ | '' | ||
+ | * 24 Cores (2x 24core Intel Xeon Silver 4116 CPU @ 2.10GHz), 48 threads | ||
+ | * 128gb RAM | ||
+ | * OS: 2x 240GB Intel SSD in RAID1 | ||
+ | * /local: 2x 960GB Intel SSD RAID0 | ||
==== Storage ==== | ==== Storage ==== | ||
Line 246: | Line 253: | ||
====== Using the GPU ====== | ====== Using the GPU ====== | ||
- | |||
- | ===== CUDA_VISIBLE_DEVICES ===== | ||
- | Do not set this variable. It will be set for you by SLURM. | ||
- | |||
- | The variable name is actually misleading; since it does NOT mean the amount of devices, but rather the physical device number assigned by the kernel (e.g. / | ||
- | |||
- | For example: If you requested multiple gpu's from SLURM (--gres=gpu: | ||
- | |||
===== GRES Multiple GPU's on one system ===== | ===== GRES Multiple GPU's on one system ===== | ||
Line 340: | Line 339: | ||
- | ===== Paths ===== | + | ===== Environment Variables |
- | You will need to add the following to your '' | + | |
+ | ==== CUDA_HOME, LD_LIBRARY_PATH ==== | ||
+ | |||
+ | Please make sure you specify $CUDA_HOME and if you want to take advantage of CUDNN libraries you will need to append / | ||
+ | |||
+ | cuda_version=9.2 | ||
+ | export CUDA_HOME=/ | ||
+ | export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: | ||
+ | |||
+ | Currently we support the same versions of CUDA that the latest version of CUDNN supports. This is not written in stone and we can accommodate most other versions if required; just let techstaff know what your needs are. | ||
+ | |||
+ | ==== PATH ==== | ||
+ | You may also need to add the following to your '' | ||
export PATH=$PATH:/ | export PATH=$PATH:/ | ||
- | export LD_LIBRARY_PATH=$LD_LIBRARY_PATH=/usr/local/ | + | |
+ | ==== CUDA_VISIBLE_DEVICES ==== | ||
+ | Do not set this variable. It will be set for you by SLURM. | ||
+ | |||
+ | The variable name is actually misleading; since it does NOT mean the amount of devices, but rather the physical device number assigned by the kernel (e.g. /dev/nvidia2). | ||
+ | |||
+ | For example: If you requested multiple gpu's from SLURM (--gres=gpu: | ||
Line 402: | Line 419: | ||
STDERR should be blank. | STDERR should be blank. | ||
====== More ====== | ====== More ====== | ||
- | If you feel this documentation is lacking in some way please let techstaff know. Email [[techstaff@cs.uchicago.edu]], | + | If you feel this documentation is lacking in some way please let techstaff know. Email [[techstaff@cs.uchicago.edu]], |
/var/lib/dokuwiki/data/pages/techstaff/slurm.txt · Last modified: 2021/01/06 16:13 by kauffman