Ollama

Table of contents

Run Ollama as a chatbot on your local machine
Run Ollama models in Python on TReNDS cluster

Ollama is an open-source framework for running language models on the local machine.

Run Ollama as a chatbot on your local machine

Step 1. On your local machine, create a virtual environment and install Open WebUI.

Create a virtual environment for Open WebUI: python -m venv openwebui. The virtual environment will be made in the current directory. You can also create a conda environment.
Activate the virtual environment: source openwebui/bin/activate.
Install Open WebUI: pip install open-webui.

Step 2. On TReNDS cluster, submit a SLURM job and start Ollama.

Start Ollama by submitting the following SLURM job script. Check the status of the job using the command squeue -u <username>, where <username> is your username. Check the node where Ollama is running in the output of the command. The node name is under NODELIST(REASON) in the format arctrdagnXXX, where XXX is a number. Let’s use OLLAMA_NODE to refer to the node where Ollama is running.

#!/bin/bash
#SBATCH -p qTRDGPU
#SBATCH -A trends53c17
#SBATCH -t 00:30:00
#SBATCH -c 24
#SBATCH --mem=100g
#SBATCH --gres=gpu:A40:2

# Add trends apps to your path
export PATH=/trdapps/linux-x86_64/bin/:$PATH

# Ensure both GPUs are visible to Ollama
export CUDA_VISIBLE_DEVICES=0,1

# Set environment variables for large context RAG optimization
export OLLAMA_HOST_MEMORY=false
export OLLAMA_KEEP_ALIVE=-1
export OLLAMA_MMAP=true
export GGML_CUDA_FORCE_CUBLAS=1
export GGML_CUDA_FORCE_MMQ=1
export OLLAMA_HOST=0.0.0.0
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_MODELS=/data/users4/splis/ollama/models/
# Force GPU backend
export OLLAMA_BACKEND=gpu

# Run ollama serve
ollama serve

Step 3. On your local machine, connect to the cluster and start chatting.

Run this command in the terminal: ssh -L 8081:localhost:11434 -J <username>@arctrdagn019 <username>@<OLLAMA_NODE> -fN.
Run this command subsequently: OLLAMA_BASE_URL="http://localhost:8081" open-webui serve.
Open your browser and http://localhost:8080. Also read the output from the above command in case it shows a different local address.
Create a user and start chatting.

Run Ollama models in Python on TReNDS cluster

Run the step 2 in the section above to submit a SLURM job script and start Ollama.
See the following example to use an Ollama model in a Python script. Remember to change the variable OLLAMA_NODE to the node where Ollama is running.

import json
import requests

OLLAMA_NODE = "arctrdagnXXX" # TODO: Change it to the node where Ollama is running
BASE_URL = f"http://{OLLAMA_NODE}:11434/api/chat"

model = "gemma3-optimized:27b" # TODO: Change it to the model you want to use
message = "What is the capital of France?" # TODO: Change it to the message you want to ask the model

response = requests.post(
  BASE_URL,
  json = {
    "model": model,
    "messages": [{"role": "user", "content": message}],
    "stream": False
  }
)

print(json.dumps(response.json(), indent=2))