Quantisation techniques difference?

No-Belt7582@alien.top · 1 year ago

You are famous everywhere for those comparisons.

No-Belt7582@alien.top · 1 year ago

I use kobold cpp for local llm deployment. It’s clean, it’s easy and allows for sliding context. Can interact with drop in replacement for OpenAI.

No-Belt7582@alien.top · 1 year ago

Most of the times issue is with prompt template, especially with the spaces ###instruction vs ### instruction etc.

Smaller models need good prompt, I tried with newer version of mistral 2.5 7B prompts work superbly on that.

No-Belt7582@alien.top · 1 year ago

How are you serving your gptq models?

No-Belt7582@alien.top · 1 year ago