What’s recommended hosting for open source LLMs?

decruz007@alien.top · 2 years ago

What’s recommended hosting for open source LLMs?

openLLM4All@alien.top · 2 years ago

I noticed TheBloke was using Massed Compute to quantize models. I’ve been poking around and using their hardware a bit more

sshh12@alien.top · 2 years ago

Huge fan of modal, have been using them for a couple serverless LLM and Diffusion models. Can be definitely on the costly side, but like that the cost directly scales based on requests and setup is trivial.

recent project with modal: https://github.com/sshh12/llm-chat-web-ui/tree/main/modal

Belnak@alien.top · 2 years ago

https://docs.mistral.ai/cloud-deployment/

navrajchohan@alien.top · 2 years ago

https://replicate.com

Ok-Goal@alien.top · 2 years ago

In our internal lab office, we’re using https://ollama.ai/ with https://github.com/ollama-webui/ollama-webui to locally host LLMs, docker compose provided by ollama-webui team worked like a charm for us.

clxyder@alien.top · 2 years ago

Do you have hardware to serve the API or do you want to run this from the cloud?

decruz007@alien.top · 2 years ago

Looking at cloud as an option. Don’t really have hardware now.

DreamGenX@alien.top · 2 years ago

I can recommend vLLM. Also offers OpenAI compatible API service, if you want that.

dazld@alien.top · 2 years ago

Did you think about running out of a local m1 Mac mini? Ollama uses the Mac GPU out of the box.

apepkuss@alien.top · 2 years ago

WebAssembly based open source LLM inference (API service and local hosting): https://github.com/second-state/llama-utils

RustyLanguage@alien.top · 2 years ago

hmm cool. seems the size for the inference app only a few MBs

m0dE@alien.top · 2 years ago

fullmetal.ai

ImNewHereBoys@alien.top · 2 years ago

Just curious. What are you using it for?

decruz007@alien.top · 2 years ago

Knowledge base, general GPT use, interaction with our CMS to add or update data.

carlosglz11@alien.top · 2 years ago

Let us know what you end up going with op! I’m interested in something like this as well…