===== About Machines ===== Last updated: May 2026 ====== AI Cluster - Slurm ====== Please send in a ticket requesting to be added if it is your first time using the AI cluster. You will need to be involved in research with a CS faculty member. Feedback is requested. Find us in the Slack ai-cluster channel (channel ID: C02KW3M0BDK). ====== Infrastructure ====== Summary of nodes installed on the cluster: AI Cluster Specs --------------------- CPU Cores: 2960 System Mem: 34389 GB GPU Memory: 8032 GB GPUs: 92 A40 48GB 52 L40S 48GB 14 H100 80GB Storage: 483 TB --------------------- ===== Computer/GPU Nodes ===== We like the alphabet, so we have compute node groups for just about every letter in it. * "a" series: 3 nodes, each with 64 CPU threads, 192GB RAM, four RTX2080ti GPUs * "aa" series: 2 nodes, each with 32 CPU threads, 32GB RAM, four RTX2080 GPUs * "b", "d", "e", "k", "r" series: 15 nodes, each with 64 CPU threads, 512GB RAM, four A40's * "c" series: 1 node with 48 CPU threads, 64GB RAM, two A30's * "f" & "j" series: 6 nodes, each with 32 CPU threads, 128GB RAM, four A40's * "g" & "q" series: 4 nodes, each with 96 CPU threads, 1TB RAM, eight L40S GPUs * "h" series: 1 node with 96 CPU threads, 1TB RAM, four H100 SXM GPUs * "l" series: 1 node with 256 CPU threads, 1.5TB RAM, six H100 PCI GPUs * "m" series: 3 nodes with 128 CPU threads, 1.5TB RAM, no GPU's * "n" series: 1 node with 96 CPU threads, 1.5TB RAM, four H100 SXM GPUs * "t" series: 5 nodes with 48 CPU threads, 512GB RAM, four L40S GPUs * all compute nodes: * Each node has a /local space for times when it's beneficial to not write over NFS. Space in /local varies from node to node. Please try to clean up when you're done. * Home directories and project space are mounted over NFS. Default quota for home directories is 50GB, but it may be increased as needed with permission. * Research groups may additionally be allocated project space that exists outside the home directory quota on different storage servers, for collaboration and shared storage. ===== Storage ===== * ai-storage1: * 63T total storage * uplink to cluster network: 100G * /home/ * 50G quota per user. * ai-storage2: * 63T total storage * uplink to cluster network: 100G * /net/scratch: Create yourself a directory /net/scratch/$USER. Use it for whatever you want. * Eventually data will be auto deleted after X amount of time. Maybe 90 days or whatever we determine makes sense. * ai-storage3: * zfs mirror with previous snapshots of ai-storage1 and ai-storage4. * ai-storage4: * 70TB total storage * uplink to cluster network: 100G * /net/projects: * Idea would be to create a dataset with a quota for people in a collaboration group to use. * Normal LDAP groups that you are used to and available everywhere else would control access to these directories. e.g. jonaslab, sandlab * peanut-storage1: * 273TB total storage * uplink to cluster network: 100G fiber * /net/bulk: * A nice place for large datasets that either don't change much, or are being used and re-used a lot. * /net/archive: * A place to keep data from projects2 that isn't actively being worked on. * peanut-storage2: * 546TB total storage * uplink to cluster network: 100G fiber * backups from peanut-storage3 and peanut-storage4 * peanut-storage3: * 224TB total storage * uplink to cluster network: 100G fiber * /net/projects2: * Even more project space for your projects that you can put your projects in. * peanut-storage4: * 28TB total storage * uplink to cluster 100G fiber * instructional storage for Peanut Cluster (NOT research)