User Tools

Site Tools


techstaff:aicluster

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
techstaff:aicluster [2020/11/11 16:21]
kauffman
techstaff:aicluster [2020/12/10 12:57]
kauffman [Batch]
Line 59: Line 59:
 There are a set of front end nodes that give you access to the Slurm cluster. You will connect through these nodes and need to be on these nodes to submit jobs to the cluster. There are a set of front end nodes that give you access to the Slurm cluster. You will connect through these nodes and need to be on these nodes to submit jobs to the cluster.
  
-    fe.ai.cs.uchicago.edu+    ​ssh cnetid@fe.ai.cs.uchicago.edu
  
   * Requires a CS account.   * Requires a CS account.
-  * ssh cnetid@fe.ai.cs.uchicago.edu 
  
 +==== File Transfer ====
 +You will use the FE nodes to transfer your files onto the cluster storage infrastructure. The network connections on those nodes are 2x 10G each.
 +
 +=== Quota ===
 +  * By default users are given a quota of 20G.
  
 ====== Demo ====== ====== Demo ======
Line 118: Line 122:
 0,1,2,3 0,1,2,3
 </​code>​ </​code>​
 +
 +==== Notes on CUDA_VISIBLE_DEVICES ====
 +CUDA_VISIBLE_DEVICES:​ Displays relative gpu device number available to you. 
 +
 +  * This variable should NOT be modified. Ever.
 +  * Relative means that if you requested one gpu it will show up as 0. Even if all other gpus on the server are being used by others.
  
  
Line 140: Line 150:
 </​code>​ </​code>​
  
 +
 +===== Jupyter Notebook Tips =====
 +==== Batch ====
 +The process for a batch job is very similar.
 +
 +jupyter-notebook.sbatch
 +<​code>​
 +#!/bin/bash
 +unset XDG_RUNTIME_DIR
 +NODEIP=$(hostname -i)
 +NODEPORT=$(( $RANDOM + 1024))
 +echo "ssh command: ssh -N -L 8888:​$NODEIP:​$NODEPORT `whoami`@fe01.ai.cs.uchicago.edu"​
 +. ~/​myenv/​bin/​activate
 +jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
 +</​code>​
 +
 +Check the output of your job to find the ssh command to use when accessing your notebook.
 +
 +Make a new ssh connection to tunnel your traffic. The format will be something like:
 +
 +''​%%ssh -N -L 8888:###​.###​.###​.###:####​ user@fe01.ai.cs.uchicago.edu.edu%%''​
 +
 +This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
 +
 +Open your local browser and visit: ''​%%http://​localhost:​8888%%''​
 +==== Interactive ====
 +  - ''​%%srun --pty bash%%''​ run an interactive job
 +  - ''​%%unset XDG_RUNTIME_DIR%%''​ jupyter tries to use the value of this environment variable to store some files, by defaut it is set to ''​ and that causes errors when trying to run juypter notebook.
 +  - ''​%%export NODEIP=$(hostname -i)%%''​ get the ip address of the node you are using
 +  - ''​%%export NODEPORT=$(( $RANDOM + 1024 ))%%''​ get a random port above 1024
 +  - ''​%%echo $NODEIP:​$NODEPORT%%''​ echo the env var values to use later
 +  - ''​%%jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser%%'​ start the jupyter notebook
 +  - Make a new ssh connection with a tunnel to access your notebook
 +  - ''​%%ssh -N -L 8888:​$NODEIP:​$NODEPORT user@fe01.ai.cs.uchicago.edu%%''​ using the values not variables
 +  - This will make an ssh tunnel on your local machine that fowards traffic sent to ''​%%localhost:​8888%%''​ to ''​%%$NODEIP:​$NODEPORT%%''​ via the ssh tunnel. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
 +  - Open your local browser and visit: ''​%%http://​localhost:​8888%%''​
  
/var/lib/dokuwiki/data/pages/techstaff/aicluster.txt · Last modified: 2021/01/06 16:11 by kauffman