As requested, this is the subreddit’s second megathread for model discussion. This thread will now be hosted at least once a month to keep the discussion updated and help reduce identical posts.

I also saw that we hit 80,000 members recently! Thanks to every member for joining and making this happen.


Welcome to the r/LocalLLaMA Models Megathread

What models are you currently using and why? Do you use 7B, 13B, 33B, 34B, or 70B? Share any and all recommendations you have!

Examples of popular categories:

  • Assistant chatting

  • Chatting

  • Coding

  • Language-specific

  • Misc. professional use

  • Role-playing

  • Storytelling

  • Visual instruction


Have feedback or suggestions for other discussion topics? All suggestions are appreciated and can be sent to modmail.

^(P.S. LocalLLaMA is looking for someone who can manage Discord. If you have experience modding Discord servers, your help would be welcome. Send a message if interested.)


Previous Thread | New Models

  • HvskyAI@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I’m late to the party on this one.

    I’ve been loving the 2.4BPW EXL2 quants from Lone Striker recently, specifically using Euryale 1.3 70B and LZLV 70B.

    Even at the smaller quant, they’re very capable, and leagues ahead of smaller models in terms of comprehension and reasoning. Min-P sampling parameters have been a big step forward, as well.

    The only downside I can see is the limitation to context length on a single 24GB VRAM card. Perhaps further testing of Nous-Capyabara 34B at 4.65BPW on EXL2 is in order.

    • FullOf_Bad_Ideas@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Remember to try 8-bit cache If you haven’t yet, it should get you to 5.5k tokens context length.

      You can get around 10-20k context length with 4bpw yi-34b 200k quants on single 24GB card.