I was trying to run some models using text-generation-webui. I was testing mistralai_Mistral-7B-v0.1, NousResearch_Llama-2-7b-hf and stabilityai_StableBeluga-7B. Without much success. I remember I needed to change to 4 bits and I couldn’t make them respond in a good way for chat. So before I go further, is 8GB a problem? Because Stable Diffusion works beautifuly. Could you share your experiences with 8GB GPU?
You must log in or register to comment.
Download koboldcpp and get gguf version of any model you want, preferably 7B from our pal thebloke. Only get Q4 or higher quantization. Q6 is a bit slow but works good. In koboldcpp. Exe select cublast and set the layers at 35-40.
You should get abot 5T/s or more.
This is the simplest method to run llms from my testing.