UPDATE: I forgot to mention that I used q8 of all models.
So I’ve had a lot of interest in the non-code 34b Finetunes, whether it’s CodeLlama base or Yi base. From the old Samantha-34b and Synthia-34b to the new Dolphin-Yi and Nous-Capybara 34b models, I’ve been excited for each one because it fills a gap that needs filling.
My problem is that I can’t seem to wrangle these fine-tunes into working right for me. I use Oobabooga (text-gen-ui), and always try to choose the correct instruction template either specified on the card or on TheBloke’s page, but the models never seem to be happy with the result, and either get confused very easily or output odd gibberish from time to time.
For both Yi models, I am using the newest ggufs that TheBloke put out… yesterday? Give or take. But I’ve tried the past 2-3 different ggufs for the same model he’s updated with when they came out.
The best luck I’ve had with the new Yi models was doing just plain chat mode with my AI Assistant’s character prompt as the only thing being sent in, but even then both Yi fine-tunes that I tried eventually broke down after a few thousand context.
For example, after a bit of chattering with the models I tried a very simple little test on both: “Please write me two paragraphs. The content of the paragraphs is irrelevant, just please write two separate paragraphs about anything at all.” I did that because previous versions of these two struggled to make a new line, so I just wanted to see what would happen. This absolutely confused the models, and the results were wild.
Has anyone had luck getting them to work? They appear to have so much potential, especially Nous Capybara which went toe to toe with GPT-4 in this benchmark, but I’m failing miserably at unlocking its full potential lol. If you have gotten it to work, could you please specify what settings/instructions you’re using?
Try setting repetition penalty to 1.0, it helped a lot on base model before finetune. Are you limited to trying gguf only? There was an issue with llama.cpp that made it to So that BOS token was always inserted, but Yi works best without BOS token. Make sure that llama cpp version you have in oobabooga has this fixed or try running newest llama.cpp exe yourself. I get good results with my 2 private yi-34b qlora fine-tunes and with LoneStriker’s spicyboros 3.1-2 (all exl2). I think it’s better than llama 70b 2.4bpw… I didn’t check dolphin or nous Capybara. To be honest I am not sure I was filling in the context to 4096 in any case, I think I kept it around 300-2000.
Using exl2 I have not had too many issues with the models. I did not d/l the GGUF because I read all the issues concerning BOS tokens.
They are similar to 70b but with poorer reasoning. With proper template they follow most instructions.
I’ve just been plodding along with them using dynamic temperature and min_P
BOS tokens? What issues? I’m familiar enough with a lot of LLM terminology but not that, sorry. And this field moves so fast I don’t know all the gossip!
It was causing the replies to be whack.
(i know this is not a roleplay question but anyway _) Settings i use for Silly tavern and the Nous Capybara model. Works perfect so far, but you also need the character CFG globally enabled to 1.5 to make it stop looping. { “temp”: 0.1, “temperature_last”: true, “top_p”: 1, “top_k”: 25, “top_a”: 0, “tfs”: 1, “epsilon_cutoff”: 0, “eta_cutoff”: 0, “typical_p”: 1, “min_p”: 0.05, “rep_pen”: 1, “rep_pen_range”: 0, “no_repeat_ngram_size”: 15, “penalty_alpha”: 0, “num_beams”: 1, “length_penalty”: 1, “min_length”: 0, “encoder_rep_pen”: 1, “freq_pen”: 0, “presence_pen”: 0, “do_sample”: true, “early_stopping”: false, “add_bos_token”: true, “truncation_length”: 2048, “ban_eos_token”: false, “skip_special_tokens”: true, “streaming”: true, “mirostat_mode”: 2, “mirostat_tau”: 2.55, “mirostat_eta”: 0.1, “guidance_scale”: 1, “negative_prompt”: “”, “grammar_string”: “”, “banned_tokens”: “”, “type”: “ooba”, “legacy_api”: false, “rep_pen_size”: 0, “genamt”: 1024, “max_length”: 16128 }
Might I ask what context and instruct template your using?
I use nous-capybara for nlp processing with json output for work.
Nous capybara yi-34 q6 gguf works very well for complex summarization (which requires some instruction following) of large texts. Temp=0, repeat penalty =[0.9-1.1]. Prompt: SYSTEM: USER: ASSISTANT:
neural chat 7b_q8:
Paragraph One: Life is full of ups and downs, twists and turns. It's important to find a balance between embracing change and holding onto what truly matters in our lives. Throughout the years, I have learned that the key to happiness lies within fostering strong connections with family, friends, and people who bring positivity into our world. In times of adversity, it is these relationships that provide unwavering support and love.
Paragraph Two: As we progress through life, challenges will undoubtedly present themselves. The ability to adapt and grow from those difficulties is crucial for personal growth and self-improvement. Embracing change can often be frightening, but it also offers opportunities to learn new skills, strengthen our abilities, and push the boundaries of what we once thought was possible. Ultimately, it's through embracing these life lessons that we evolve into our best selves, fostering resilience and personal fulfillment.
repeated a few times, it never fails, different topics, paragraph, newline, paragraph