• 0 Posts
  • 15 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle

  • candre23@alien.topBtoLocalLLaMA55B Yi model merges
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It’s a new foundational model, so some teething pains are to be expected. Yi is heavily based on (directly copied, for the most part) llama2, but there are just enough differences in the training parameters that default llama2 settings don’t get good results. KCPP has already addressed the rope scaling, and I’m sure it’s only a matter of time before the other issues are hashed out.



  • Yes, your GPU is too old to be useful for offloading, but you could still use it for prompt processing acceleration at least.

    With your hardware, you want to use koboldCPP. This uses models in GGML/GGUF format. You should have no issue running models up to 120b with that much RAM, but large models will be incredibly slow (like 10+ minutes per response) running on CPU only. Recommend sticking to 13b models unless you’re incredibly patient.












  • The 3090 will outperform the 4060 several times over. It’s not even a competition - it’s a slaughter.

    As soon as you have to offload even a single layer to system memory (regardless of the speed), you cut your performance by an order of magnitude. I don’t care if you have screaming fast DDR5 in 8 channels and a pair of the beefiest xeons money can buy, your performance will fall off a cliff the minute you start offloading. If a 3090 is within your budget, that is the unambiguous answer.