QuIP#: SOTA 2-bit quantization method, now implemented in text-generation-webui (experimental)

oobabooga4@alien.top · 2 years ago

QuIP#: SOTA 2-bit quantization method, now implemented in text-generation-webui (experimental)

oobabooga4@alien.top · 2 years ago

I’m desensitized at this point. I wonder if this is yet another Pretraining on the Test Set Is All You Need marketing stunt or not, as most new models lately have been.

oobabooga4@alien.top · 2 years ago

transformers library PR: GrammarConstrainedLogitsProcessor, compatible with llama.cpp GBNF

oobabooga4@alien.top · 2 years ago

Gradio is a 70MB requirement FYI. It has become common to see people calling text-generation-webui “bloated”, when most of the installation size is in fact due to Pytorch and the CUDA runtime libraries.

https://preview.redd.it/pgfsdld7xw0c1.png?width=370&format=png&auto=webp&s=c50a14804350a1391d57d0feac8a32a5dcf36f68