None at all, it’s just a merge. I’m not even really sure where to begin training it lol.
None at all, it’s just a merge. I’m not even really sure where to begin training it lol.
I don’t think this is true. Goliath wasn’t fine-tuned or trained at all and it outperforms every 70b I’ve ever used.
Hi, I’m the creator of Venus-120b.
Venus has Synthia 1.5 mixed in with it, which as you noted performs pretty badly on RP. I’m currently working on a trimmed down version of Venus that has 100b parameters and I’m using SynthIA 1.2b for that, which I believe scored much better in oyur last RP tests. I’ll probably also make a 1.1 version of Venus-120b that uses SynthIA 1.2b as well to see if that helps fix some of the issues with it.
Venus-120b is actually a bit bigger than Goliath-120b. Venus has 140 layers and Goliath has 136 layers, so that would explain it.
Crap, what’s your setup? I tested it with a single 48GB card but if you’re using 2x 24 then it might not work. I’ll have to make a 2.8 bpw quant (or get someone else to do it) so that it’ll work with card splitting.
I used this dataset for the quants: https://huggingface.co/datasets/jasonkstevens/pippa-llama2-chat/tree/refs%2Fconvert%2Fparquet/default/train
Yeah I wanted a picture to go with the model and that’s what stable diffusion spat out :D
And I haven’t tried it for SFW stuff but my guess is that it would work fine.
🤔 How are you trying to load it? I tested both quants in text-generation-webui and they worked fine for me. I used exllama2_hf to load it
Hard to say. Try it out and let me know!
Try it out and let me know! I included Nous-Hermes in the merge because I’ve found it to be one of the best roleplaying models that doesn’t hallucinate too much. However, Nous-Hermes also tends to lack a bit in terms of the prose it writes, from my experience. I was hoping to get something that’s coherent most of the time and creative.
Thanks! I’m eager to see the results :)
I use sillytavern along with text-generation-webui in api mode. Best setup for roleplay imo.
min P seems similar to tail free sampling. I think the difference is that TFS tries to identify the “tail” by computing the derivative of the token probability function.
I wasn’t aware that Exl2 had issues with quality. Your tests seem to suggest that equivalent bpw in Exl2 produce worse results than in GGUF. I wonder why that is.
Goliath wasn’t fine-tuned at all, it’s just a merge.