What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse
Now everyone is most interested in how much better it is than 70b llama
70b Storytelling q5 k m
A friend told me that for 70b when using q4, performance drops by 10%. The larger the model, the less it suffers from weight quantization
120 thousand rubles.
I was an idiot when assembling the PC and somehow inexplicably focused on the processor when assembling the PC, and the video card is quite weak. However, after a while, I realized that this was for the better. I can use the 70B model with 1 token per second. Maybe in the future I will buy another video card so that more layers will help with data processing
3060 12 & 13600K
What tests have you tested this in?
I’m very interested in storytelling and RP