AI/Machine Learning

Add ollama to kubernet cluster

ByMin Wang

Feb 19, 2024

Ollama is a front-end written in go, and wrap-up the back-end of llama.cpp

Here is the steps for setup ollama in k8s cluster

(1) write a k8s yaml file,

where

using ollama docker image ( ollama/ollama ),

and put into k8s with persistent storage for model ( mount as /root/.ollama)

expose ollama 11434 as service (load balancer)

(2) login to ollama, pull models

kubectl exec -it comrite-web-ollama-8676669bb-jvx2z — /bin/bash

nohup ollama pull llama2 &

nohup ollama pull codellama &

we do not need to run ollama run llama2 inside container as it started ollama serve already, it can load model based on request

(3) test:

curl http://192.168.86.88:11434/api/generate -d ‘{
“model”: “llama2”,
“prompt”:”Why is the sky blue?”
}’

curl -X POST http://192.168.86.88:11434/api/generate -d ‘{
“model”: “codellama”,
“prompt”: “Write me a function that outputs the fibonacci sequence”
}’

can pass the parameter to disable streamming

“stream”: false

By Min Wang

AI/Machine Learning

Q&A: Fine-Tuning and Guidance on diffusion models

Aug 3, 2024 Min Wang

AI/Machine Learning

Build your own GPU

Mar 28, 2024 Min Wang

AI/Machine Learning

How to run stable diffusion on CPU boxes

Feb 25, 2024 Min Wang

Add ollama to kubernet cluster

ByMin Wang

By Min Wang

Related Post

Q&A: Fine-Tuning and Guidance on diffusion models

Build your own GPU

How to run stable diffusion on CPU boxes

You missed

troubleshooing missing ip in k8s ( metallb-system)

Q&A: Fine-Tuning and Guidance on diffusion models

coding judge system

what is std::forward and universal reference