I am talking about this particular model:
https://huggingface.co/TheBloke/goliath-120b-GGUF
I specifically use: goliath-120b.Q4_K_M.gguf
I can run it on runpod.io on this A100 instance with “humane” speed, but it is way too slow for creating long form text.
These are my settings in text-generation-webui:
Any advice? Thanks
Which mode do you use? Chat, chat-instruct or instruct?