ExLlamaV2: The Fastest Library to Run LLMs

alchemist1e9@alien.top · 2 years ago

ExLlamaV2: The Fastest Library to Run LLMs

llama_in_sunglasses@alien.top · 2 years ago

I’ve tested pretty much all of the available quantization methods and I prefer exllamav2 for everything I run on GPU, it’s fast and gives high quality results. If anyone wants to experiment with some different calibration parquets, I’ve taken a portion of the PIPPA data and converted it into various prompt formats, along with a portion of the synthia instruction/response pairs that I’ve also converted into different prompt formats. I’ve only tested them on OpenHermes, but they did make coherent models that all produce different generation output from the same prompt.

https://desync.xyz/calsets.html

ExLlamaV2: The Fastest Library to Run LLMs

ExLlamaV2: The Fastest Library to Run LLMs

Just a moment...