Never use the Q_8 versions of GGUFs unless most/all of the model can comfortably fit into your VRAM. The Q_6 version is much smaller, and almost the same quality.
For your setup, I would use mythomax-l2-13b.Q4_K_M.gguf.
Never use the Q_8 versions of GGUFs unless most/all of the model can comfortably fit into your VRAM. The Q_6 version is much smaller, and almost the same quality.
For your setup, I would use mythomax-l2-13b.Q4_K_M.gguf.
I’d think so? Why not just try it?
More context would be helpful.
If I had to guess, are you typing in a password or something sensitive? Sometimes programs will make it so that your input does not show up in the command prompt if you’re typing in something like a password. In this case, just type the word and then hit enter, it doesn’t show, but it’s there.
Usually it’s going to depend on what format models you’re using.
I’m a big GGUF user, so I would use https://github.com/abetlen/llama-cpp-python.git.
If you’re a big GTPQ user, you might use https://github.com/PanQiWei/AutoGPTQ or https://github.com/turboderp/exllama.
If you’re just looking for non-quantized models, or maybe you just like to use this anyway, you could use https://huggingface.co/docs/transformers/index.
Yeah there’s so much to learn I’m still figuring a lot out too.
Good tip for settings: Play around mostly with temperature, top-p, and min-p.