Using gptq if there is not enough video memory on the GPU. How do others do it?
I read somewhere that a video card can use RAM to compensate for the lack of its own memory, but the memory taken from the RAM will be 10 times slower. How to do it? If I’m not mistaken, then for this you need to install a specific version of the video card driver. I have a 3060 12GB and 64GB of RAM.
Maybe this is not the smartest idea, considering that I can get good speed using GGUF, but I heard that if I use exllama2, the speed will be 2 times faster when using a video card.
Help me figure out what’s what.
You must log in or register to comment.