• 1 Post
  • 3 Comments
Joined 1 year ago
cake
Cake day: October 24th, 2023

help-circle

  • I have a RTX 4090, 96GB of RAM and a i9-13900k CPU, and I still keep going back to 20b (4-6bpw) models due to the awful performance of 70b models, which 2.4bpw is supposed to fully fit the VRAM in… even using Exllama2…

    What is your trick to get better performance? If I don’t use a small lame context of 2048, the speed of generating is actually un-usable (under 1 token/sec), what context are you using and what settings? Thank you.