Differences

This shows you the differences between two versions of the page.

--- techstaff:aicluster [2020/11/11 11:30] – [AI Cluster - Slurm] kauffman
+++ techstaff:aicluster [2020/12/10 12:57] – [Batch] kauffman
@@ Line 5: / Line 5: @@
 Feedback is requested:
- #ai-cluster Discord channel or email Phil Kauffman (kauffman@cs dot uchicago dot edu).
+ [[https://discord.gg/ZVjX8Gv|#ai-cluster Discord channel]] or email Phil Kauffman (kauffman@cs dot uchicago dot edu).
-===== TODO =====
+Knowledge of how to use Slurm already is preferred at this stage of testing.
-Since I'm still working on it, I don't guarantee any uptime yet. Mainly I need to make sure TRES tracking is working like we want. This will involve restarting slurmd and slurmctld which will kill running jobs.
+The information from the older cluster mostly applies and I suggest you read that documentation: https://howto.cs.uchicago.edu/techstaff:slurm
-  * <del>generate report of storage usage</del>
-  * <del>groups (Slurm 'Accounts') created for PI's.</del>
-    * <del>e.g. ericj_group: ericj, user1, user1, etc</del>
-  * <del>grab QOS data from somewhere (gsheet or some kind of DB)</del>
-  * <del>Properly deploy sync script</del>
-    * <del>Systemd unit</del>
-    * <del>main loop</del>
-  * <del>research on slurm plugin to force GRES selection on job submit. Might be able to use: </del>
-    * <del>SallocDefaultCommand</del>
-    * <del>Otherwise look for 'AccountingStorageTRES' and 'JobSubmitPlugins' and  /etc/slurm-llnl/job_submit.lua <= used to force user to specify '--gres'.</del>
-    * <del>jobs that do not specify a specific gpu type (e.g. gpu:rtx8000 or gpu:rtx2080ti) could be counted against either one but not specifically the you actually used.</del>
-    * <del>From 'AccountingStorageTRES' in slurm.conf: "Given a configuration of "AccountingStorageTRES=gres/gpu:tesla,gres/gpu:volta" Then "gres/gpu:tesla" and "gres/gpu:volta" will track jobs that explicitly request those GPU types. If a job requests GPUs, but does not explicitly specify the GPU type, then its resource allocation will be accounted for as either "gres/gpu:tesla" or "gres/gpu:volta", although the accounting may not match the actual GPU type allocated to the job and the GPUs allocated to the job could be heterogeneous. In an environment containing various GPU types, use of a job_submit plugin may be desired in order to force jobs to explicitly specify some GPU type."</del>
-  * <del>ganglia for Slurm: http://ai-mgmt2.ai.cs.uchicago.edu</del>
-    * figure why summary view is no longer a thing.
-  * <del>update 'coolgpus'. Lose VTs when this is running.</del>
-    * <del>coolgpus: sets fan speeds of all gpus in system.</del>
-    * <del>Goal is to statically set fan speeds to 80%. The only way to do this is with fake Xservers... but that means you lose all the VTs. Is this a compromise I'm willing to make?</del> It is.
-  * home directory
-    * setup backups for home dirs
-    * default quota
-    * home directory usage report
+====== Infrastructure ======
+Summary of nodes installed on the cluster
+===== Computer/GPU Nodes =====
+  * 6x nodes
+    * 2x Xeon Gold 6130 CPU @ 2.10GHz (64 threads)
+    * 192G RAM
+    * 4x Nvidia GeForce RTX2080Ti
+  * 2x nodes
+    * 2x Xeon Gold 6130 CPU @ 2.10GHz (64 threads)
+    * 384G RAM
+    * 4x Nvidia Quadro RTX 8000
+  * all:
+    * zfs mirror mounted at /local
+      * compression to lz4: Usually this has a performance gain as less data is read and written to disk with a small overhead in CPU usage.
+      * As of right now there is no mechanism to clean up /local. At some point I'll probably put a find command in cron that deletes files older than 90 days or so.
+===== Storage =====
+  * ai-storage1:
+    * 41T total storage
+    * uplink to cluster network: 2x 25G
+    * /home/<username>
+      * We intend to set user quotas, however, there are no quotas right now.
+    * /net/projects: (Please ignore this for now)
+      * Lives on the home directory server.
+      * Idea would be to create a dataset with a quota for people to use.
+      * Normal LDAP groups that you are used to and available everywhere else would control access to these directories. e.g. jonaslab, sandlab
+  * ai-storage2:
+    * 41T total storage
+    * uplink to cluster network: 2x 25G
+    * /net/scratch: Create  yourself a directory /net/scratch/$USER. Use it for whatever you want.
+    * Eventually data will be auto deleted after X amount of time. Maybe 90 days or whatever we determine makes sense.
+  * ai-storage3:
+    * zfs mirror with previous snapshots of 'ai-storage1'.
+    * NOT a backup.
+    * Not enabled yet.
+====== Login ======
+There are a set of front end nodes that give you access to the Slurm cluster. You will connect through these nodes and need to be on these nodes to submit jobs to the cluster.
+    ssh cnetid@fe.ai.cs.uchicago.edu
+  * Requires a CS account.
+==== File Transfer ====
+You will use the FE nodes to transfer your files onto the cluster storage infrastructure. The network connections on those nodes are 2x 10G each.
+=== Quota ===
+  * By default users are given a quota of 20G.
 ====== Demo ======
-kauffman3 is my test CS account.
+kauffman3 is my CS test account.
 <code>
@@ Line 89: / Line 123: @@
 </code>
+==== Notes on CUDA_VISIBLE_DEVICES ====
+CUDA_VISIBLE_DEVICES: Displays relative gpu device number available to you.
-===== Fairshare =====
+  * This variable should NOT be modified. Ever.
+  * Relative means that if you requested one gpu it will show up as 0. Even if all other gpus on the server are being used by others.
-# Check out the fairshare values
-<code>
-kauffman3@fe01:~$ sshare --long --accounts=kauffman3,kauffman4 --users=kauffman3,kauffman4
-             Account       User  RawShares  NormShares    RawUsage NormUsage  EffectvUsage  FairShare    LevelFS GrpTRESMins     TRESRunMins
--------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------
-kauffman3                                1    0.000094         428 1.000000      1.000000              0.000094 cpu=475,mem=2807810,energy=0,+
- kauffman3            kauffman3          1    1.000000         428 1.000000      1.000000   0.000094   1.000000 cpu=475,mem=2807810,energy=0,+
-kauffman4                                1    0.000094           0 0.000000      0.000000                   inf cpu=0,mem=0,energy=0,node=0,b+
- kauffman4            kauffman4          1    1.000000           0 0.000000      0.000000   1.000000        inf cpu=0,mem=0,energy=0,node=0,b+
-</code>
+===== Fairshare/QOS =====
+By default all usage is tracked and charged to a users default account. A fairshare value is computed and used in prioritizing a job on submission.
-We are using the FairTree (fairshare algorithm). This is the default in Slurm these days and from what I can tell probably better suits our needs. It is no big deal to change to classic fairshare.
+Details are being worked out for anyone that donates to the cluster. This will be some sort of tiered system where you get to use a higher priority when you need it.
+You will need to charge an account on job submission ''%%--account=<name>%%'' and most likely select the priority level you wish to use and that you are allowed to use: ''%%--qos=<level>%%''
-As the system exists now. One Account per User.
-<code>
- Account: kauffman
-   Member: kauffman
- User: kauffman
-</code>
-We will probably assign fairshare points to accounts, not users.
-====== Storage ======
-  /net/scratch:
-     Create  yourself a directory /net/scratch/$USER. Use it for whatever you want.
-  /net/projects:
-    Lives on the home directory server.
-    Idea would be to create a dataset with a quota for people to use.
-    Normal LDAP groups that you are used to and available everywhere else would control access to these directories.
-    e.g. jonaslab, sandlab
-Currently there is no quota on home directories. This is set per user per dataset.
-I was able to get homes and scratch each connected via 2x 25G. Both are SSD only so the storage should be FAST.
-Each compute node (nodes with gpus) has a zfs mirror mounted at /local
-I set compression to lz4 by default. Usually this has a performance gain as less data is read and written to disk with a small overhead in CPU usage.
-As of right now there is no mechanism to clean up /local. At some point I'll probably put a find command in cron that deletes files older than 90 days or so.
@@ Line 144: / Line 141: @@
 > Do we have a max job runtime?
-Yes. 4 hours. This is done per partition.
+Yes. 4 hours. This is done per partition. You are expected to write your code to accommodate for this.
 <code>
 PartitionName=geforce Nodes=a[001-006] Default=YES DefMemPerCPU=2900 MaxTime=04:00:00 State=UP Shared
@@ Line 153: / Line 151: @@
-You can take a look at all the values we set here:
+===== Jupyter Notebook Tips =====
+==== Batch ====
+The process for a batch job is very similar.
+jupyter-notebook.sbatch
+<code>
+#!/bin/bash
+unset XDG_RUNTIME_DIR
+NODEIP=$(hostname -i)
+NODEPORT=$(( $RANDOM + 1024))
+echo "ssh command: ssh -N -L 8888:$NODEIP:$NODEPORT `whoami`@fe01.ai.cs.uchicago.edu"
+. ~/myenv/bin/activate
+jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
+</code>
+Check the output of your job to find the ssh command to use when accessing your notebook.
+Make a new ssh connection to tunnel your traffic. The format will be something like:
+''%%ssh -N -L 8888:###.###.###.###:#### user@fe01.ai.cs.uchicago.edu.edu%%''
-  fe0[1,2]$ cat /etc/slurm-llnl/slurm.conf
+This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
-The man page: https://slurm.schedmd.com/slurm.conf.html
+Open your local browser and visit: ''%%http://localhost:8888%%''
+==== Interactive ====
+  - ''%%srun --pty bash%%'' run an interactive job
+  - ''%%unset XDG_RUNTIME_DIR%%'' jupyter tries to use the value of this environment variable to store some files, by defaut it is set to '' and that causes errors when trying to run juypter notebook.
+  - ''%%export NODEIP=$(hostname -i)%%'' get the ip address of the node you are using
+  - ''%%export NODEPORT=$(( $RANDOM + 1024 ))%%'' get a random port above 1024
+  - ''%%echo $NODEIP:$NODEPORT%%'' echo the env var values to use later
+  - ''%%jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser%%' start the jupyter notebook
+  - Make a new ssh connection with a tunnel to access your notebook
+  - ''%%ssh -N -L 8888:$NODEIP:$NODEPORT user@fe01.ai.cs.uchicago.edu%%'' using the values not variables
+  - This will make an ssh tunnel on your local machine that fowards traffic sent to ''%%localhost:8888%%'' to ''%%$NODEIP:$NODEPORT%%'' via the ssh tunnel. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
+  - Open your local browser and visit: ''%%http://localhost:8888%%''