QuIP#: SOTA 2-bit quantization method, now implemented in text-generation-webui (experimental)

oobabooga4@alien.top · 2 years ago

QuIP#: SOTA 2-bit quantization method, now implemented in text-generation-webui (experimental)

llama_in_sunglasses@alien.top · 2 years ago

With Llama-2-70b-chat-E8P-2Bit from their zoo, quip# seems fairly promising. I’d have to try l2-70b-chat in exl2 at 2.4 bpw to compare but this model does not really feel like a 2 bit model so far, I’m impressed.

a_beautiful_rhind@alien.top · 2 years ago

From the issue about this in the exllamav2 repo, quip was using more memory and slower than exl. How much context can you fit?

QuIP#: SOTA 2-bit quantization method, now implemented in text-generation-webui (experimental)

QuIP#: SOTA 2-bit quantization method, now implemented in text-generation-webui (experimental)

Add QuIP# support by oobabooga · Pull Request #4803 · oobabooga/text-generation-webui