Table of contents
Ollama is an open-source framework for running language models on the local machine. They are not large, so we do not call them LLMs but it is customary now to call them LLMs anyway: gpt-oss, gemma, qwen, etc.
Run Ollama as a chatbot on your local machine
Step 1. On your local machine, create a virtual environment and install Open WebUI.
- Create a virtual environment for Open WebUI:
python -m venv openwebui. The virtual environment will be made in the current directory. You can also create a conda environment. - Activate the virtual environment:
source openwebui/bin/activate. - Install Open WebUI:
pip install open-webui.
Step 2. On TReNDS cluster, submit a SLURM job and start Ollama.
- Start Ollama by submitting the following SLURM job script. Check the status of the job using the command
squeue -u <username>, where<username>is your username. Check the node where Ollama is running in the output of the command. The node name is underNODELIST(REASON)in the formatarctrdagnXXX, whereXXXis a number. Let’s useOLLAMA_NODEto refer to the node where Ollama is running.
#!/bin/bash
#SBATCH -p qTRDGPU
#SBATCH -A trends53c17
#SBATCH -t 00:30:00
#SBATCH -c 24
#SBATCH --mem=100g
#SBATCH --gres=gpu:A40:2
# Add trends apps to your path
export PATH=/trdapps/linux-x86_64/bin/:$PATH
# Ensure both GPUs are visible to Ollama
export CUDA_VISIBLE_DEVICES=0,1
# Set environment variables for large context RAG optimization
export OLLAMA_HOST_MEMORY=false
export OLLAMA_KEEP_ALIVE=-1
export OLLAMA_MMAP=true
export GGML_CUDA_FORCE_CUBLAS=1
export GGML_CUDA_FORCE_MMQ=1
export OLLAMA_HOST=0.0.0.0
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_MODELS=/data/users4/splis/ollama/models/
# Force GPU backend
export OLLAMA_BACKEND=gpu
# Run ollama serve
ollama serve
Step 3. On your local machine, connect to the cluster and start chatting.
- Run this command in the terminal:
ssh -L 8081:localhost:11434 -J <username>@arctrdagn019 <username>@<OLLAMA_NODE> -fN. - Run this command subsequently:
OLLAMA_BASE_URL="http://localhost:8081" open-webui serve. - Open your browser and
http://localhost:8080. Also read the output from the above command in case it shows a different local address. - Create a user and start chatting.
Run Ollama models in Python on TReNDS cluster
- Run the step 2 in the section above to submit a SLURM job script and start Ollama.
- See the following example to use an Ollama model in a Python script. Remember to change the variable
OLLAMA_NODEto the node where Ollama is running.
import json
import requests
OLLAMA_NODE = "arctrdagnXXX" # TODO: Change it to the node where Ollama is running
BASE_URL = f"http://{OLLAMA_NODE}:11434/api/chat"
model = "gemma3-optimized:27b" # TODO: Change it to the model you want to use
message = "What is the capital of France?" # TODO: Change it to the message you want to ask the model
response = requests.post(
BASE_URL,
json = {
"model": model,
"messages": [{"role": "user", "content": message}],
"stream": False
}
)
print(json.dumps(response.json(), indent=2))