🐺🐦‍⬛ LLM Format Comparison/Benchmark: 70B GGUF vs. EXL2 (and AWQ)

WolframRavenwolf@alien.top · 3 years ago

🐺🐦‍⬛ LLM Format Comparison/Benchmark: 70B GGUF vs. EXL2 (and AWQ)

CosmosisQ@alien.top · 3 years ago

Hell yeah! Two days in a row! We need more people doing format comparisons and benchmarks in general. Again, thank you for all of your hard work, and keep 'em coming!

How would you say EXL2 subjectively compares to GGUF? Have you had the chance to roleplay with both formats outside of Voxta+VaM (i.e., in SillyTavern)? I ask because I’m sure the increased generation speed is more important than anything when using Voxta+VaM so it might be easier to compare their output quality in SillyTavern.

On that note, would you say you now prefer using lzlv (70B, EXL2) over OpenChat 3.5 (7B, GGUF) with Voxta+VaM?

Model	Format	Quant	Offloaded Layers	VRAM Used	Primary Score	Secondary Score	Speed +mmq	Speed -mmq
lizpreciatior/lzlv_70B.gguf	GGUF	Q4_K_M	83/83	39362.61 MB	18/18	4+3+4+6 = 17/18
lizpreciatior/lzlv_70B.gguf	GGUF	Q5_K_M	70/83 !	40230.62 MB	18/18	4+3+4+6 = 17/18
TheBloke/lzlv_70B-GGUF	GGUF	Q2_K	83/83	27840.11 MB	18/18	4+3+4+6 = 17/18	4.20T/s	4.01T/s
TheBloke/lzlv_70B-GGUF	GGUF	Q3_K_M	83/83	31541.11 MB	18/18	4+3+4+6 = 17/18	4.41T/s	3.96T/s
TheBloke/lzlv_70B-GGUF	GGUF	Q4_0	83/83	36930.11 MB	18/18	4+3+4+6 = 17/18	4.61T/s	3.94T/s
TheBloke/lzlv_70B-GGUF	GGUF	Q4_K_M	83/83	39362.61 MB	18/18	4+3+4+6 = 17/18	4.73T/s !!	4.11T/s
TheBloke/lzlv_70B-GGUF	GGUF	Q5_K_M	70/83 !	40230.62 MB	18/18	4+3+4+6 = 17/18	1.51T/s	1.46T/s
TheBloke/lzlv_70B-GGUF	GGUF	Q5_K_M	80/83	46117.50 MB	OutOfMemory
TheBloke/lzlv_70B-GGUF	GGUF	Q5_K_M	83/83	46322.61 MB	OutOfMemory
LoneStriker/lzlv_70b_fp16_hf-2.4bpw-h6-exl2	EXL2	2.4bpw		11,11 -> 22 GB	BROKEN
LoneStriker/lzlv_70b_fp16_hf-2.6bpw-h6-exl2	EXL2	2.6bpw		12,11 -> 23 GB	FAIL
LoneStriker/lzlv_70b_fp16_hf-3.0bpw-h6-exl2	EXL2	3.0bpw		14,13 -> 27 GB	18/18	4+2+2+6 = 14/18
LoneStriker/lzlv_70b_fp16_hf-4.0bpw-h6-exl2	EXL2	4.0bpw		18,17 -> 35 GB	18/18	4+3+2+6 = 15/18
LoneStriker/lzlv_70b_fp16_hf-4.65bpw-h6-exl2	EXL2	4.65bpw		20,20 -> 40 GB	18/18	4+3+2+6 = 15/18
LoneStriker/lzlv_70b_fp16_hf-5.0bpw-h6-exl2	EXL2	5.0bpw		22,21 -> 43 GB	18/18	4+3+2+6 = 15/18
LoneStriker/lzlv_70b_fp16_hf-6.0bpw-h6-exl2	EXL2	6.0bpw		> 48 GB	TOO BIG
TheBloke/lzlv_70B-AWQ	AWQ	4-bit			OutOfMemory

🐺🐦‍⬛ LLM Format Comparison/Benchmark: 70B GGUF vs. EXL2 (and AWQ)

🐺🐦‍⬛ LLM Format Comparison/Benchmark: 70B GGUF vs. EXL2 (and AWQ)

My AI Workstation:

Observations:

Conclusion: