Question about the possibility of running large models on a 3070ti 32gb ram, what’s the best way to run them if possible, without quality loss?
Speed isn’t an issue, just want to be able to run such models ambiently.
Question about the possibility of running large models on a 3070ti 32gb ram, what’s the best way to run them if possible, without quality loss?
Speed isn’t an issue, just want to be able to run such models ambiently.
llama.cpp and upload some layers to VRAM, you may be able to run 70B, depends on quantization.