Where and how to run Goliath 120b GGUF with good performance?

abandonedexplorer@alien.top · 2 years ago

Where and how to run Goliath 120b GGUF with good performance?

whtne047htnb@alien.top · 2 years ago

The GGUF one has 140 layers, more than what the textgen UI supports (128). So the slowness may be because you are using CPU for some layers (check your terminal output when loading the model). But you can manually change the source code and set the max value of the n_gpu_layers slider to a higher value (just grep for it).

Evening_Ad6637@alien.top · 2 years ago

This is the only helpful because right answer.

kruk2@alien.top · 2 years ago

or open the UI, go to model page, right click on the layers slider -> inspect element
and update max value for the input field from 128 to 256

abandonedexplorer@alien.top · 2 years ago

Cant believe that worked lol! Thank you so much. The speed increased significantly!

MINIMAN10001@alien.top · 2 years ago

I mean it makes sense The value is chosen we’re simply chosen for being a reasonable window at the time.

There was nothing hard coded about them they were simply a range of values that they had set for the UI.

It certainly is interesting though.