Table of contents
  1. GPU queue/partitions
  2. GPU partitions preemption rule
  3. Special nodes
  4. Allocating GPUs in SLURM
  5. Job array with multiple tasks on each GPU
    1. The JobSubmit.sh file
    2. Submitting the job
  6. Start interactive mode/terminal/bash on GPU worker/compute nodes

GPU queue/partitions

Partitions GPUs Nodes CPUs Memory Time limit
qTRDGPU[H/M/L] gpu:V100:8 dgx001-dgx004 40 512GB 5d 8h
  gpu:V100:4 gn001-gn002 40 192GB 5d 8h
  gpu:A100:8 dgxa001 40 1TB 5d 8h
qTRDGPU gpu:RTX:1 agn001-agn020 64 512GB 5d 8h

GPU partitions preemption rule

Partitions Priority Limitations Preemption
qTRDGPUH high Max 4 GPUs per user N/A
qTRDGPUM medium Max 8 GPUs per user suspend
qTRDGPUL low N/A suspend
qTRDGPU N/A N/A N/A

Special nodes

Nodes CPUs Memory GPUs Purpose
arctrdgndev101.rs.gsu.edu 4 62GB TITAN X:2 GPU development & testing
trendsagn019.rs.gsu.edu 64 512GB gpu:RTX:1 GPU development & testing

Allocating GPUs in SLURM

When allocating GPUs in SLURM, use the value in the GPUs column in the above table as the --gres parameter. See examples below.

Job array with multiple tasks on each GPU

The JobSubmit.sh file

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 10
#SBATCH --mem=50g
#SBATCH --gres=gpu:V100:1
#SBATCH -p qTRDGPUH
#SBATCH -t 4-00
#SBATCH -J <job name>
#SBATCH -e error%A-%a.err
#SBATCH -o out%A-%a.out
#SBATCH -A <slurm_account_code>
#SBATCH --oversubscribe
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email address>

sleep 10s

echo $HOSTNAME >&2

source <path to conda installation>/bin/activate <name/path of conda environment>

python script.py --arg1 $SLURM_ARRAY_TASK_ID &
python script.py --arg1 $SLURM_ARRAY_TASK_ID &
wait

sleep 10s

Submitting the job

sbatch --array=1-8%2 JobSubmit.sh

Start interactive mode/terminal/bash on GPU worker/compute nodes

can be run on the login node

# Start interactive mode on a GPU worker node
$ srun -p qTRDGPUH -A <slurm_account_code> -v -n1 --pty --mem=10g --gres=gpu:V100:1 /bin/bash

Page last modified: May 10 2022 at 09:49 PM.