@metamec

metamec@alien.top · 1 year ago

If you want a GPU with 12GB VRAM to do most of the work, 70b is way too big. You need to be looking at 13b models. Mythomax, Tiefighter, Causallm (actually 14b), etc. Mistral 7b mashups (Dolphin, OpenHermes, etc.) are decent though.

Koboldcpp is probably the easiest most intuitive Windows option to get up and running with GPU support. Enable CuBLAS, maybe offload 41/43 layers to GPU for 13b models, 35/35 for 7b models. That works best with my RTX 4070 Ti which has the same amount of VRAM.

I usually use 6bit quantized version of 13b models, 8bit quantized versions of 7b models. Maybe try lower bit versions if the slower GPU causes a performance hit (I doubt it will be obvious).

Download models from TheBloke is possible. There’s no need to go handing your email address over to CausalLM and others who ask for it when he does not require it.

metamec@alien.top · 1 year ago

I would say research the CPU/MOBO combo, not just the CPU. In addition: make sure you buy RAM which is on the Mobo manufacturer’s supported list. I have a Ryzen 7 5800 and an MSI MAG B550 TOMAHAWK motherboard. Not gonna lie, I always consider AMD a risk due to chipset related quirks I have experienced with previous builds, but it has been fine so far.