Chinese Yellow Pages | Classifieds | Knowledge | Tax | IME

(1) use container

https://github.com/abetlen/llama-cpp-python/pkgs/container/llama-cpp-python

(2) mount k8s storage as /models

export MODEL point to the right llama-model.gguf

(3) expose 8000 to loadbalancer to outside

(4) browse to ip:8000/docs for API exploration

Leave a Reply

Your email address will not be published. Required fields are marked *