I am talking about this particular model:
https://huggingface.co/TheBloke/goliath-120b-GGUF
I specifically use: goliath-120b.Q4_K_M.gguf
I can run it on runpod.io on this A100 instance with “humane” speed, but it is way too slow for creating long form text.
These are my settings in text-generation-webui:
Any advice? Thanks
I mean it makes sense The value is chosen we’re simply chosen for being a reasonable window at the time.
There was nothing hard coded about them they were simply a range of values that they had set for the UI.
It certainly is interesting though.