I’ve been checking out the latest models of people tweaking goliath120b. I found this one to be the best by far with that issue and the strange spelling stuff. Might be worth giving a try to compare for yourself: https://huggingface.co/LoneStriker/Tess-XL-v1.0-4.85bpw-h6-exl2 (Lonestriker has other bpw)
- 0 Posts
- 7 Comments
Murky-Ladder8684@alien.topBto LocalLLaMA•Exllama outside of text generation webui?English1·2 years agoCheck out turbo’s project https://github.com/turboderp/exui
He just put it up not long ago and he has Speculative Decoding working on it. I tried it with Goliath 120b 4.85bpw exl2 and was getting 11-13 t/s vs 6-8 t/s without it. It’s barebones but works.
In the instructions on github it said to use mono 24000 wav. Double check the info though.
Murky-Ladder8684@alien.topBto LocalLLaMA•Is it worth using a bunch of old GTX 10 series cards ( like 1060 1070 1080 ) for running local LLM?English1·2 years agoThose series of Nvidia gpus didn’t have tensor cores yet and believe they started in 20xx series. I not sure how much it impacts inference purposes vs training/fine tuning but worth doing more research. From what I gathered the answer is “no” unless you use a 10xx for like monitor output, TTS, or other smaller co-llm use that you don’t want taking vram away from your main LLM GPUs.
Murky-Ladder8684@alien.topBto LocalLLaMA•How much more stupid is the 120B goliath Q3_K_M than the larger options?English1·2 years agofor comparison sake EXL2 4.85bpw version runs around 6-8 t/s on 4x3090s at 8k context it’s the lower end.
Murky-Ladder8684@alien.topBto LocalLLaMA•How much more stupid is the 120B goliath Q3_K_M than the larger options?English1·2 years ago4x3090s will run it at over 4bits.
If you learn AutoGen you could assign each model to a different agent and have them interact. If using the same model and having multiple char talk is your thing than the sillytavern group option is the way.