I’ve been encountering a repetition issue with models like Goliath 120b and Xwin 70b on Sillytavern + OpenRouter.While I understand that changing models can have a significant impact, I’m puzzled by the repetition problem.Despite my efforts to find online resources for correct settings, my searches for Auroboros 70b, Xwin 70b, Lzlb 70b, and others have been in vain.
I came across posts on this subreddit addressing similar concerns, but unfortunately, they lacked solutions.One suggestion was to “use the shortwave preset,” but it seems to be nonexistent.Unsure of what I might be overlooking, I’m reaching out here for help.The 120b model should theoretically outperform the 7b/13b models, but I suspect there’s a configuration issue.
If anyone could provide insights or share the correct settings for these models, it would greatly help not only me but also future users facing the same issue.Let’s compile a comprehensive guide here so that anyone searching the internet for a solution can find this post and get the answers they need.Thank you in advance for your assistance!
PS: mythomax 13B seems to be the best model because it’s the only one that actually works…
I run my models locally only, and my best experience has been using mirostat to combat repetition and samey regenerations. Before that I used contrastive search with some success.
I have to say though, I’m not sure if mirostat would be a good solution through Openrouter. Doesn’t it have like its own little cache or something (referring to mirostat)? Definitely seems like it’s caching generated tokens somehow and tries to avoid them in the future, or something like that.
Anyways, for 70b xwin & lzlv, my settings have been simple: everything on default values (1 or 0), mirostat mode=2, tau=2-3,ETA=1. This gets me great responses, zero repetition, high variety when regenerating, and not too many hallucinations. These settings seem pretty stable. I sometimes tweak tau or raise/lower the temp, but eventually always end up at those settings again.
But e.g. for the new 34b Yi fine-tunes, these settings don’t work. It’s like I’m back in the early days of llama2, showing the problems you mentioned: The models start to loop and repeat, and not just in the same response, but repeat previous responses verbatim as well, reuse the same phrases again and again, don’t know when to stop, etc. For those, I haven’t found good, stable settings so far no matter what I change (they seem to prefer low temp though), which is so frustrating, as they are great otherwise. So mirostat is not a magic bullet it seems.
Can’t say anything about goliath unfortunately (haven’t used it).
I’m loving min_P + dynamic temperature. Feels like the only sampler I need.
These are better than shortwave, which was good in itself.
Another one to try is mirostat.
See what openrouter lets you use out of that.