AI/Machine Learning

Add ollama to kubernet cluster

ByMin Wang

Feb 19, 2024

Ollama is a front-end written in go, and wrap-up the back-end of llama.cpp

Here is the steps for setup ollama in k8s cluster

(1) write a k8s yaml file,

where

using ollama docker image ( ollama/ollama ),

and put into k8s with persistent storage for model ( mount as /root/.ollama)

expose ollama 11434 as service (load balancer)

(2) login to ollama, pull models

kubectl exec -it comrite-web-ollama-8676669bb-jvx2z — /bin/bash

nohup ollama pull llama2 &

nohup ollama pull codellama &

we do not need to run ollama run llama2 inside container as it started ollama serve already, it can load model based on request

(3) test:

curl http://192.168.86.88:11434/api/generate -d ‘{
“model”: “llama2”,
“prompt”:”Why is the sky blue?”
}’

curl -X POST http://192.168.86.88:11434/api/generate -d ‘{
“model”: “codellama”,
“prompt”: “Write me a function that outputs the fibonacci sequence”
}’

can pass the parameter to disable streamming

“stream”: false

By Min Wang

AI/Machine Learning

Build your own GPU

Mar 28, 2024 Min Wang

AI/Machine Learning

How to run stable diffusion on CPU boxes

Feb 25, 2024 Min Wang

AI/Machine Learning cloud technology

add llama-cpp-python to kubernet cluster

Feb 19, 2024 Min Wang

Add ollama to kubernet cluster

ByMin Wang

By Min Wang

Related Post

Build your own GPU

How to run stable diffusion on CPU boxes

add llama-cpp-python to kubernet cluster

Leave a Reply Cancel reply

You missed

what is std::forward and universal reference

Update k8s certs

iouring vs traditional read/write on files

how condition variable wait_for with timeout implemented