minus-squarefrontenbrecher@alien.topBtoLocalLLaMA•llama2 13B on Gtx 1070linkfedilinkEnglisharrow-up1·1 year agouse koboldcpp to split between GPU/CPU with gguf format, preferably a 4ks quantization for better speed. I am sure that it will be slow, possibly 1-2 token per second. linkfedilink
use koboldcpp to split between GPU/CPU with gguf format, preferably a 4ks quantization for better speed. I am sure that it will be slow, possibly 1-2 token per second.