Table of contents
Ollama is an open-source framework for running language models on the local machine.
Run Ollama as a chatbot on your local machine
Step 1. On your local machine, create a virtual environment and install Open WebUI.
- Create a virtual environment for Open WebUI:
python -m venv openwebui
. The virtual environment will be made in the current directory. You can also create a conda environment. - Activate the virtual environment:
source openwebui/bin/activate
. - Install Open WebUI:
pip install open-webui
.
Step 2. On TReNDS cluster, submit a SLURM job and start Ollama.
- Start Ollama by submitting the following SLURM job script. Check the status of the job using the command
squeue -u <username>
, where<username>
is your username. Check the node where Ollama is running in the output of the command. The node name is underNODELIST(REASON)
in the formatarctrdagnXXX
, whereXXX
is a number. Let’s useOLLAMA_NODE
to refer to the node where Ollama is running.
#!/bin/bash
#SBATCH -p qTRDGPU
#SBATCH -A trends53c17
#SBATCH -t 00:30:00
#SBATCH -c 24
#SBATCH --mem=100g
#SBATCH --gres=gpu:A40:2
# Add trends apps to your path
export PATH=/trdapps/linux-x86_64/bin/:$PATH
# Ensure both GPUs are visible to Ollama
export CUDA_VISIBLE_DEVICES=0,1
# Set environment variables for large context RAG optimization
export OLLAMA_HOST_MEMORY=false
export OLLAMA_KEEP_ALIVE=-1
export OLLAMA_MMAP=true
export GGML_CUDA_FORCE_CUBLAS=1
export GGML_CUDA_FORCE_MMQ=1
export OLLAMA_HOST=0.0.0.0
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_MODELS=/data/users4/splis/ollama/models/
# Force GPU backend
export OLLAMA_BACKEND=gpu
# Run ollama serve
ollama serve
Step 3. On your local machine, connect to the cluster and start chatting.
- Run this command in the terminal:
ssh -L 8081:localhost:11434 -J <username>@arctrdagn019 <username>@<OLLAMA_NODE> -fN
. - Run this command subsequently:
OLLAMA_BASE_URL="http://localhost:8081" open-webui serve
. - Open your browser and
http://localhost:8080
. Also read the output from the above command in case it shows a different local address. - Create a user and start chatting.
Run Ollama models in Python on TReNDS cluster
- Run the step 2 in the section above to submit a SLURM job script and start Ollama.
- See the following example to use an Ollama model in a Python script. Remember to change the variable
OLLAMA_NODE
to the node where Ollama is running.
import json
import requests
OLLAMA_NODE = "arctrdagnXXX" # TODO: Change it to the node where Ollama is running
BASE_URL = f"http://{OLLAMA_NODE}:11434/api/chat"
model = "gemma3-optimized:27b" # TODO: Change it to the model you want to use
message = "What is the capital of France?" # TODO: Change it to the message you want to ask the model
response = requests.post(
BASE_URL,
json = {
"model": model,
"messages": [{"role": "user", "content": message}],
"stream": False
}
)
print(json.dumps(response.json(), indent=2))