• 1 Post
  • 68 Comments
Joined 1 year ago
cake
Cake day: November 8th, 2023

help-circle






  • There are actually TSVs for 3D Cache on the AMD 7900 series, but AMD doesn’t use them. Presumably because it makes the chip run hotter, so they’d have to downclock it.

    But I think it would be a great candidate for an ML card. Not for directly accelerating models, but for basically fitting any kind of intermediate calculations in cache to preserve all the RAM bandwidth for model weights.







  • Another thing to note is that the exllamav2 backend is “special” because its context takes up less vram than the context in other backends. So lets say the weights take 18GB, and your context takes up 6GB for a gguf model. In exllama thats only 3GB taken up by the context with the 8 bit cache.

    There are other complications like the prompt processing batch size, but thats the jist of it.

    This makes a dramatic difference when the context gets huge. I’d prefer to use koboldcpp myself, but I just can’t really squeeze it on my 3090 without excessive offloading.



  • I’m sick of Nvidia’s VRAM business model

    At the top end, they are actually limited by how much they can physically hang off the die (48GB for current silicon, or 196GB(?) for the interposer silicon).

    But yeah, below that its price gouging. What are ya gonna do, buy an Arc?

    AMD is going along with this game too. You’d see a lot more 7900s on this sub, and on GitHub, if AMD let their manufacturers double up the VRAM to 48GB.