How to install llama.cpp version for Qwen72B?

Secret_Joke_2262@alien.top · 2 years ago

How to install llama.cpp version for Qwen72B?

Secret_Joke_2262@alien.top · 2 years ago

What tests have you tested this in?

I’m very interested in storytelling and RP

Secret_Joke_2262@alien.top · 2 years ago

What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse

Secret_Joke_2262@alien.top · 2 years ago

Now everyone is most interested in how much better it is than 70b llama

Secret_Joke_2262@alien.top · 2 years ago

70b Storytelling q5 k m

Secret_Joke_2262@alien.top · 2 years ago

A friend told me that for 70b when using q4, performance drops by 10%. The larger the model, the less it suffers from weight quantization

Secret_Joke_2262@alien.top · 2 years ago

How much more stupid is the 120B goliath Q3_K_M than the larger options?

Secret_Joke_2262@alien.top · 2 years ago

120 thousand rubles.

I was an idiot when assembling the PC and somehow inexplicably focused on the processor when assembling the PC, and the video card is quite weak. However, after a while, I realized that this was for the better. I can use the 70B model with 1 token per second. Maybe in the future I will buy another video card so that more layers will help with data processing

3060 12 & 13600K

Secret_Joke_2262@alien.top · 2 years ago

Using gptq if there is not enough video memory on the GPU. How do others do it?