I dont have budget for hosting models on dedicated GPU, what are the alternative options or platforms that let me use Opensource models like mistral, Llamas, etc in a pay per API call basis ?
What’s the use case? Chatting with them, or for your own apps?
Check out open router too
It may be out of your range, but you can pick up the dell precision 7720 with a 16gb P5000 GPU for about $500 on eBay. The Quadro P5000 is also in a few other workstation laptop models around that era. Note: They had other graphics options so only go for P5000 models.
Openrouter and some of the models hosted are free.
Google colab but depends on how long google will let you use it for free (you can also pay monthly)
HuggingFace has inference endpoint which is private & public as needed with sleep built in
I’m currently exploring different models too, in particular for coding. Tried deepseek-coder on their official website and it was good. Unfortunately they collect chat data. Anyone know of a pay-as-you-go services that offers this model?
https://www.anyscale.com/endpoints#hosted Good service. I use all it the time. Also has fine-tuning options if u need.
One I used before is runpod.io, but it is a pay per time platform, not API.