GPU Computing

Table of contents

GPU queue/partitions
GPU partitions preemption rule
Special nodes
Allocating GPUs in SLURM
Job array with multiple tasks on each GPU
1. The JobSubmit.sh file
2. Submitting the job
Start interactive mode/terminal/bash on GPU worker/compute nodes

GPU queue/partitions

Partitions	GPUs	Nodes	CPUs	Memory	Time limit
qTRDGPU[H/M/L]	gpu:V100:8	dgx001-dgx004	40	512GB	5d 8h
	gpu:V100:4	gn001-gn002	40	192GB	5d 8h
	gpu:A100:8	dgxa001	40	1TB	5d 8h
qTRDGPU	gpu:RTX:1	agn001-agn020	64	512GB	5d 8h

GPU partitions preemption rule

Partitions	Priority	Limitations	Preemption
qTRDGPUH	high	Max 4 GPUs per user	N/A
qTRDGPUM	medium	Max 8 GPUs per user	suspend
qTRDGPUL	low	N/A	suspend
qTRDGPU	N/A	N/A	N/A

Special nodes

Nodes	CPUs	Memory	GPUs	Purpose
arctrdgndev101.rs.gsu.edu	4	62GB	TITAN X:2	GPU development & testing
trendsagn019.rs.gsu.edu	64	512GB	gpu:RTX:1	GPU development & testing

Allocating GPUs in SLURM

When allocating GPUs in SLURM, use the value in the GPUs column in the above table as the --gres parameter. See examples below.

Job array with multiple tasks on each GPU

The `JobSubmit.sh` file

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 10
#SBATCH --mem=50g
#SBATCH --gres=gpu:V100:1
#SBATCH -p qTRDGPUH
#SBATCH -t 4-00
#SBATCH -J <job name>
#SBATCH -e error%A-%a.err
#SBATCH -o out%A-%a.out
#SBATCH -A <slurm_account_code>
#SBATCH --oversubscribe
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email address>

sleep 10s

echo $HOSTNAME >&2

source <path to conda installation>/bin/activate <name/path of conda environment>

python script.py --arg1 $SLURM_ARRAY_TASK_ID &
python script.py --arg1 $SLURM_ARRAY_TASK_ID &
wait

sleep 10s

Submitting the job

sbatch --array=1-8%2 JobSubmit.sh

Start interactive mode/terminal/bash on GPU worker/compute nodes

can be run on the login node

# Start interactive mode on a GPU worker node
$ srun -p qTRDGPUH -A <slurm_account_code> -v -n1 --pty --mem=10g --gres=gpu:V100:1 /bin/bash