DustGrouchy1792@alien.topOPBtoLocalLLaMA•Any tricks to speed up 13B models on a 3090?English
1·
1 year agoI’m now using a 4bit GPTQ version of the same model. After generation completes the VRAM goes up to 16.2 GB (out of 24 GB) and I have nothing else using GPU as best I can tell (no browser windows with youtube, etc).
Still only getting a bit under 4.00 tokens per second. So I don’t think stuff is getting offloaded to CPU.
Can I get koboldcpp working with sillytavern without too much of a headache?