• llama_in_sunglasses@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    With Llama-2-70b-chat-E8P-2Bit from their zoo, quip# seems fairly promising. I’d have to try l2-70b-chat in exl2 at 2.4 bpw to compare but this model does not really feel like a 2 bit model so far, I’m impressed.

    • a_beautiful_rhind@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      From the issue about this in the exllamav2 repo, quip was using more memory and slower than exl. How much context can you fit?