I use kobold cpp for local llm deployment. It’s clean, it’s easy and allows for sliding context. Can interact with drop in replacement for OpenAI.
I use kobold cpp for local llm deployment. It’s clean, it’s easy and allows for sliding context. Can interact with drop in replacement for OpenAI.
Most of the times issue is with prompt template, especially with the spaces ###instruction vs ### instruction etc.
Smaller models need good prompt, I tried with newer version of mistral 2.5 7B prompts work superbly on that.
How are you serving your gptq models?
You are famous everywhere for those comparisons.