A couple of people have asked me to share my settings for solid roleplay on 7B. Yes it is possible. So here it goes. I’ll try and make this brief and concise but full of every tweak I’ve learned so far.
So…
Step 1 - Backend
I’d recommend Koboldcpp generally but currently the best you can get is actually kindacognizant’s Dynamic Temp mod of Koboldccp. It works exactly like main Koboldccp except when you change your temp to 2.0 it overrides the setting and runs in the test dynamic temp mode. It’s actually got 2 other types of dynamic temp solutions built in there at different set temperature settings but just set it to 2 and forget imo, it seems to be the best of the 3. You can read about it here explained by kindacognizant himself. Suffice it to say it’s excellent. In my experience it reduces (though not eliminates) repetition and looping because of increased word diversity and improves the ability of the model to respond to commands.
Even without the Dynamic temp test mod, Koboldcpp would still be my recommendation due to it’s simplicity, fast run times, and lightweight nature. It’s a single exe standalone file! This makes it SO easy to upgrade and manage it’s fantastic. Better yet it’s very simple to write a quick batch file to launch your GGUF of choice with optimal settings. I’ll share an example batch file.
cd "C:\*****YOUR DIRECTORY PATH*****\SillyTavern\koboldcpp\"
start /min koboldcpp_dynamictemp_nov21.exe --model MODELOFCHOICEFILENAME.gguf --port 5001 --gpulayers 32 --highpriority --contextsize 8192 --usecublas
cd "C:\Users\Anon\Downloads\SillyTavern\"
start /min start.bat
exit
Copy that into notepad saving it as a .bat file after editing. Change the directory path to where you keep your Koboldcpp exe. Change the MODELOFCHOICEFILENAME to your GGUF model name. If you have enough VRAM change the gpulayers to 35. If it crashes when loading lower the layers. If you aren’t using an Nvidia GPU you’ll need to change the usecublas bit too. You can find the arguments listed here. Your GGUF should be kept in the same folder along with the Koboldcpp exe. I like to make a folder in my SillyTavern install location for the sake of ease.
Basically inside my SillyTavern install folder a have a folder called “koboldcpp” and inside that sits the singular koboldcpp exe, a singular GGUF file and the above singular batch file. Running that batch starts both Koboldcpp and Sillytavern (launching with their command windows minimized). SillyTavern auto connects to Koboldcpp when setup as below. After this all you ever have to do is swap out the koboldcpp exe when a new version comes out or change the GGUF name in the batch file if you ever switch models. Super easy, no hassle. Great. You never even need to look at Koboldcpps GUI if you don’t want to.
Step 2 - Front end
By consensus the best frontend for roleplay seems to be SillyTavern. I can attest to it being excellent with a breadth of options, addons and a sleek interface.
Once you’ve got it installed check out the top bar. Click the 2nd plug icon, select the KoboldAI API and hit the connect button when you have Koboldcpp running. It’s as easy as that to connect! Check auto connect to last server and it will auto connect to koboldcpp when you next launch it. Job done.
Click the leftmost icon in the top bar. Here are the settings you need when you get it installed and connected to the Koboldcpp backend if you use a mistral based 7B. Note the sample order on the bottom is also changed and that temp is set to 2.0 assuming you are using the Dynamic temp test mod. If you aren’t using that mod change temp to 1 but keep everything else the same. My typical P is disabled (thus blank), you do this by setting it to 1 and saving. Next time you open Sillytavern is will look blank like mine.
Lastly click the third “A” icon in the top bar. Select the Roleplay option in the top left dropdown and match all my settings here. Now click the drop down in the bottom center of my image. Along side “Instuct Mode sequences”. You want to change it to match this. BE SURE TO CLICK THE SAVE BUTTON TO THE RIGHT OF INSTRUCT MODE PRESETS IN ROLEPLAY NEAR THE CENTRE OF THESE OPTIONS AS WELL AS THE SAVE BUTTON NEAR THE TOP LEFT CONTEXT TEMPLATE SETTINGS.
This is actually my personal contribution to improving the output of the model. It’s a “quality jailbreak” I’ve written that formats output and gets the model to behave better. I’ve been told you can enter this type of jailbreak as a zero depth authors note but I’ve had no luck using that whereas editing this field works for me. Here is the text so you can copy paste. Be very careful to match the format of that text box exactly.
You're {{char}}.
Exclusively describe all your emotions, actions, perceptions, surroundings, thoughts, and sensations in full, elaborate, explicit, graphic, lewd, and vivid detail. Be verbose, expressive, engaging, natural, authentic, and creative. Write multiple fresh sentences, paragraphs, and phrases.
Write your internal monologue in round brackets. Write your speech in quotations. Write all other text in asterisks in third person.
To explain a bit more about this… I discovered that the “system prompt” that people generally use to instruct their models only appears once at the top of the context window. Thus it doesn’t have much strength and models don’t really strictly follow instructions placed there. Editing the field I mentioned however places that text field content after every input making it very effective for controlling models output. There are drawbacks. Apparently it influences the model so strongly it can break the models ability to call instructions which can hamper addons. But I don’t use or particular recommend any addons atm so imo for the niche of roleplay it’s all upside.
Step 3 - The choice of model
Lastly the final step is selecting a model which responds well to the “quality jailbreak”. Generally the better the model the better it’s ability to follow the instructions I put in there.
Thinking along those lines I have tested a ton of popular 7B models.
Some viable options include,
openchat_3.5 - OpenChat / OpenOrca version of the quality jailbreak
openhermes-2.5-mistral-7b - ChatML version of the quality jailbreak
openhermes-2-mistral-7b (I actually found the dialogue to be a bit better with the older model, go figure) - ChatML version of the quality jailbreak
dolphin2.1-openorca-7b - ChatML version of the quality jailbreak
All of the above models performed fairly well to varying degrees. However from my tests I would recommend the following models for the best performance,
4th dolphin-2.1-mistral-7b - ChatML version of the quality jailbreak
Responds well to the instructions but I found it a bit bland.
3rd trion-m-7b - Alpaca / Roleplay version of the quality jailbreak
Solid, worth a try, quite similar to toppy.
2nd toppy-m-7b - Download Hermans AshhLimaRP SillyTavern templates, then edit it with the quality jailbreak
Hermans AshhLimaRP SillyTavern template seems to solve a brevity problem this model otherwise has when using the regular Alpaca / Roleplay version of the quality jailbreak. Very good output that you should certainly try. You might even prefer it to my number 1 choice.
1st
Misted-7B
Alpaca / Roleplay version of the quality jailbreak
A model I’ve never heard anyone talk about and wow. It’s output is so good. It’s flavorful and follows the quality prompt the best of any model I tested by a good margin.
I manually selected seeds 1-10. Here is it’s first response in each case. Note in the 3 examples where its response is overly brief a simple continue resulted in very good output.
I would HIGHLY recommend you download and try this model even if you have no interest in my quality mod or even roleplay. I imagine the model is simply very good.
In conclusion
If you follow all the steps I’ve laid out here you will find that 7B’s are indeed capable of quite enjoyable roleplay sessions. They aren’t perfect and mistral still has issues in my experience when it goes a bit over 5kish context despite it’s 8k claims but they are a lot better for roleplay than some people think and they are only going to get better.
I’m still learning and tweaking things as I go along. I’m still playing about with my quality jailbreak to see if I can get it working better. If anyone has any other good tips or corrections to anything I’ve said please feel free to chime in.
Oh and it goes without saying that the same field I use to input the quality jailbreak can be used for a lot of things. I saw someone ask how he could make his model respond less politely. It can certainly do that. I even made it finish all it’s responses with “Nyaa” as a test. One thing to note if you want to try out commands. Use positive emphasis rather than negative. Don’t for example tell it “Don’t repeat or loop”. Imagine you are speaking to a person who is hard of hearing; such a person might well miss the “don’t” part and simply see a command saying “repeat or loop”. That’s why I wrote “Write multiple fresh sentences, paragraphs, and phrases.” Don’t ask the model “not to be polite” as it may simply latch on to “be polite”. Instead say something like “Be direct and straightforward.”
Anyway I’ve rambled on wayyy too much. Hope some people find this helpful.
Can’t you export and upload your settings? It’s kind of a pain to manually type all that
/u/reiniken has reminded me of one important point I didn’t touch on much.
It’s important to replicate the style you want the AI to write in, in the first message and in your own replies to help the AI keep replicating the format.
So write narration in 3rd person and add some bracketed thoughts in the introduction message of your card if you follow my guide.
That’s why in my examples I speak myself in 3rd person. You don’t have to the AI can keep to the format without doing so from my testing but I think writing your own narration in 3rd person helps the AI keep it’s narration in 3rd person too. If it see’s your narration in 1st person it could be tempted to write it’s narration in 1st person.
Thank you for taking the time to write this up, much appreciated
You’re most welcome. It’s the least I can do to give just a little back to the community which has been so helpful to me in advance.
Can you share what your character has for settings? I like how yours is displayed, but I don’t know how I’d set that up. Or an example if you don’t want to share specifics.
It’s actually not my card I just got it from chub.ai. It doesn’t have anything in it which formats the output style (other than the fact models will generally mimic the first message format). Which goes to show the power of the “quality jailbreak” I detail above! That’s what really drills the formatting into the model.
That said I have made some minor modifications to improve it the card (mostly typo fixes, a small modification to the scenario to make it more flexible and added one line into the introduction to help the model learn the bracketed thoughts format).
Here is my modded card.
Here is the original on chub.ai.
Bookmarked! I’ll see what it says about Amy and my other characters. I spent a lot of time on their wording and am constantly optimizing it.
Speaking of optimizations for character cards, have you heard about Sparse Priming Representations (SPR)? I’ve experimented with it and while I’m not using it directly, I’m applying some of its principles to my cards, saving precious tokens.
This is absolutely amazing but i have a question. is there a way to make it consistently generate less text? im enjoying my RPs the most when the messages are a bit more on a simpler side (around 100 tokens), but these settings make the ai generate well past the 300 token target. I tried adding stuff like “around 100 words long” or “no more than 100 words” or even “limit yourself to 100 tokens” to the last output sequence but nothing seems to work.
Hmm.
Well there is the target length (tokens) setting in SillyTaverns advanced formatting tab.
I’ve got it set to 200 as above and then the Response (tokens) setting set to 300.
The “target” is actually the setting which I’ve got set to 200. The setting at 300 is merely a “cap” it can’t go over.
So I’d start with changing the target length (tokens) to 100 and change your Response (tokens) cap to say 150-175 to give it a bit of wiggle room.
If that doesn’t work try removing the “be verbose” part of what I wrote if you are using that or edit this part to “Write multiple brief fresh sentences, paragraphs, and phrases.”
Everyone is so excited about this setting, anyone know offhand how it is presented to the backend?
I get good results depending on model asking for a size with some hyperbole, when I want a very short summary I ask for a one sentence summary and get the minimum ideas back, usually two or three to the point statements.
Consider what you ask for: a story or never ending roleplay will likely return longer messages than “write a concise message to reply as {{Char}} Do not write endings or drive toward conclusions”. Especially in controlling length, the words don’t trigger expected results, you gotta experiment with your lexicon a little.
You’re not going to be able to get a specific length always, but you should have good results by tuning in the direction you want until you get your desired output size more often with only outliers containing too much.
I guess that the custom part " JSON serialized array of strings" of the “Instruct mode” is important.
I am sharing it here as plain text, so others just need to copy and paste:
["", "<|", "\n#", "\n*{{user}} ",
"\n\n\n"]
Not going to lie I updated these awhile back when I was newer to the whole AI thing based on a recommendation and I had forgotten I even edited them until you just mentioned.
Pretty sure I changed these because /u/WolframRavenwolf does it xD
Care to enlighten us why these are a good idea Mr wolf.
Most of these are (parts of) EOS (end of sequence) tokens. The model is supposed to send an EOS token to signal that inference is done, as without that, it would keep going until the max new tokens limit is hit.
Unfortunately some models, especially merges with different prompt formats, can get confused and output the wrong token or turn the special token into a regular string. In that case, adding that string (or a part of it) to the custom stopping strings list ensures that inference is properly concluding anyways.
In addition to that, I put the asterisk followed by username there to catch the model trying to act as the user. Just like how the software by default already includes the username followed by a colon, to catch the model trying to talk as user.
Unless using some integration like stable diffusion or TTS, I would just use a prompt with the model itself. Not only is it much faster to generate responses, but it maintains better coherence because SillyTavern tends to fill up the context window with stuff it is wrapping around each response.
round brackets
I believe these are called parentheses.
Ah round brackets vs parentheses is one of those British vs American English things haha.
That said on paper parentheses probably should be the better choice as it should be less likely to be misinterpreted by the model.
I’m giving it a try with parentheses now, thanks!
Thank you, very useful!
Appreciated. If anyone has any issues with anything let me know.
The worst thing in my experience is the damn templates all these models have. So many unique templates with minor tweaks and some models are so sensitive!
I’ve literally given up on some models because I clearly couldn’t figure out the right template smh.
Same bro is just a mess, it’s like we have this feature, and it’s simple, but we are intently making harder…