User Tools

Site Tools


slurm:ai

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
slurm:ai [2021/03/02 11:40] – [Contribution Policy] kauffmanslurm:ai [2022/04/04 10:58] (current) – fix typos and add code snippet for interactive jupyter notebook chaochunh
Line 1: Line 1:
 ====== AI Cluster - Slurm ====== ====== AI Cluster - Slurm ======
-Cluster is up and running now. Anyone with CS account who wishes to test it out should do so.+Please send in ticket requesting to be added if it is your first time using the AI cluster.
  
  
Line 15: Line 15:
 Summary of nodes installed on the cluster. Summary of nodes installed on the cluster.
  
-[[ http://monitor.ai.cs.uchicago.edu|Ganglia Monitoring ]]+  * [[ http://monitor.ai.cs.uchicago.edu|Ganglia Monitoring ]] 
 +  * [[ https://monitor2.ai.cs.uchicago.edu|Grafana Graphs ]] 
 +    * Use ''%%guest%%'' as the username and password to login.
  
 ===== Computer/GPU Nodes ===== ===== Computer/GPU Nodes =====
Line 27: Line 29:
     * 384G RAM     * 384G RAM
     * 4x Nvidia Quadro RTX 8000     * 4x Nvidia Quadro RTX 8000
 +
 +  * 3x nodes
 +    * 2x AMD EPYC 7302 16-Core Processor
 +    * 512G RAM
 +    * 4x Nvidia A40
  
   * all:   * all:
Line 39: Line 46:
     * uplink to cluster network: 2x 25G     * uplink to cluster network: 2x 25G
     * /home/<username>     * /home/<username>
-      * We intend to set user quotas, however, there are no quotas right now.+      * 20G quota per user.
     * /net/projects:     * /net/projects:
       * Lives on the home directory server.       * Lives on the home directory server.
Line 58: Line 65:
  
 ====== Login ====== ====== Login ======
-There are a set of front end nodes that give you access to the Slurm cluster. You will connect through these nodes and need to be on these nodes to submit jobs to the cluster. 
  
-    ssh cnetid@fe.ai.cs.uchicago.edu+Anyone with a CS account who has previously sent in a ticket to request access to be added is allowed to login
  
-  * Requires CS account.+There are set of front end nodes that give you access to the Slurm cluster. You will connect through these nodes and need to be on these nodes to submit jobs to the cluster.
  
 +    ssh cnetid@fe.ai.cs.uchicago.edu
 ==== File Transfer ==== ==== File Transfer ====
 You will use the FE nodes to transfer your files onto the cluster storage infrastructure. The network connections on those nodes are 2x 10G each. You will use the FE nodes to transfer your files onto the cluster storage infrastructure. The network connections on those nodes are 2x 10G each.
Line 161: Line 168:
 Make a new ssh connection to tunnel your traffic. The format will be something like: Make a new ssh connection to tunnel your traffic. The format will be something like:
  
-''%%ssh -N -L 8888:###.###.###.###:#### user@fe01.ai.cs.uchicago.edu.edu%%''+''%%ssh -N -L 8888:###.###.###.###:#### user@fe01.ai.cs.uchicago.edu%%''
  
 This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
Line 168: Line 175:
 ==== Interactive ==== ==== Interactive ====
   - ''%%srun --pty bash%%'' run an interactive job   - ''%%srun --pty bash%%'' run an interactive job
-  - ''%%unset XDG_RUNTIME_DIR%%'' jupyter tries to use the value of this environment variable to store some files, by defaut it is set to '' and that causes errors when trying to run juypter notebook.+  - ''%%unset XDG_RUNTIME_DIR%%'' jupyter tries to use the value of this environment variable to store some files, by defaut it is set to ''<nowiki>''</nowiki>'' and that causes errors when trying to run juypter notebook.
   - ''%%export NODEIP=$(hostname -i)%%'' get the ip address of the node you are using   - ''%%export NODEIP=$(hostname -i)%%'' get the ip address of the node you are using
   - ''%%export NODEPORT=$(( $RANDOM + 1024 ))%%'' get a random port above 1024   - ''%%export NODEPORT=$(( $RANDOM + 1024 ))%%'' get a random port above 1024
   - ''%%echo $NODEIP:$NODEPORT%%'' echo the env var values to use later   - ''%%echo $NODEIP:$NODEPORT%%'' echo the env var values to use later
-  - ''%%jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser%%' start the jupyter notebook+  - ''%%jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser%%'' start the jupyter notebook
   - Make a new ssh connection with a tunnel to access your notebook   - Make a new ssh connection with a tunnel to access your notebook
   - ''%%ssh -N -L 8888:$NODEIP:$NODEPORT user@fe01.ai.cs.uchicago.edu%%'' using the values not variables   - ''%%ssh -N -L 8888:$NODEIP:$NODEPORT user@fe01.ai.cs.uchicago.edu%%'' using the values not variables
-  - This will make an ssh tunnel on your local machine that fowards traffic sent to ''%%localhost:8888%%'' to ''%%$NODEIP:$NODEPORT%%'' via the ssh tunnel. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.+  - This will make an ssh tunnel on your local machine that forwards traffic sent to ''%%localhost:8888%%'' to ''%%$NODEIP:$NODEPORT%%'' via the ssh tunnel. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
   - Open your local browser and visit: ''%%http://localhost:8888%%''   - Open your local browser and visit: ''%%http://localhost:8888%%''
  
 +Copy the following code snippt to the interactive node directly: 
 +<code>
 +unset XDG_RUNTIME_DIR
 +NODEIP=$(hostname -i)
 +NODEPORT=$(( $RANDOM + 1024))
 +echo "ssh command: ssh -N -L 8888:$NODEIP:$NODEPORT `whoami`@fe01.ai.cs.uchicago.edu"
 +jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
 +</code>
  
 ====== Contribution Policy ===== ====== Contribution Policy =====
 This section can be ignored by most people. [[techstaff:aicluster-admin|If you contributed to the cluster or are in a group that has you can read more here]]. This section can be ignored by most people. [[techstaff:aicluster-admin|If you contributed to the cluster or are in a group that has you can read more here]].
  
/var/lib/dokuwiki/data/attic/slurm/ai.1614706803.txt.gz · Last modified: 2021/03/02 11:40 by kauffman

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki