I’m currently trying to figure out where it is the cheapest to host these models and use them.
I realized that a lot of the finetunings are not available on common llm api sites, i want to use nous capybara 34b for example but the only one that offered that charged 20$/million tokens which seemed quite high, considering that i see Lama 70b for around 0.7$/million tokens.
So are there any sites where i could host custom finetunes and get similar rates to the one mentioned?
would a service like runpod work for you? It sells you GPU power by the hour instead of by token