What is considered the best uncensored LLM right now?

Hyddro26@alien.top · 2 years ago

What is considered the best uncensored LLM right now?

hwpoison@alien.top · 2 years ago

a finetunning of mistral can be insane haha

1dayHappy_1daySad@alien.top · 2 years ago

I do test a bunch of models, as of today I would say it is dolphin-2_2-yi-34b

Brave-Decision-1944@alien.top · 2 years ago

People, one more thing, in case of LLM, you can use simulationsly multiple GPUs, and also include RAM (and also use SSDs as ram, boosted with raid 0) and CPU, all of that at once, splitting the load.

So if your GPU is 24GB you are not limited to that in this case.

In practice: I used https://github.com/oobabooga/text-generation-webui

Copied Augmental-Unholy-13B-GGUF folder to models folder. In UI I just selected load model, it automatically switched to llama.cpp.

But there is setting n-gpu-layers set to 0 which is wrong, in case of this model I set 45-55. The results was loading and using my second GPU (NVIDIA 1050ti), while no SLI, primary is 3060, they where running both loaded full. n_ctx setting is “load of CPU”, got to drop to ~2300 for my CPU is older. Now it ran pretty much fast, up to Q4-KM. Most slowdown was caused while 100%SSD load, that’s why I think of RAID 0 (which would be ideal because it was one big chunk at top speed), but didn’t brought that another physical drive jet.

Batch 512, thread’s 8, threads batch 8, these settings where pure quess but it worked, and got to get back to it to understand properly. This subinformation may help if you want to try that on old AMD faking to be FX 8370 8core, and 14GB DDR3 RAM acting as 10GB.

YuriWerewolf@alien.top · 2 years ago

How did you set settings for memory sharing (layers) between gpus? I have 2 gpus: 3060Ti and 3060 and it seems like it tries to load everything on the first one and goes out of memory.

Brave-Decision-1944@alien.top · 2 years ago

Like this to be exact

https://preview.redd.it/bj2znub0r91c1.png?width=720&format=pjpg&auto=webp&s=c56dfb9dafd65a7c3f8ea73624118ff9c75de47d

CNWDI_Sigma_1@alien.top · 2 years ago

zephyr-7b-beta works the best for me

flossraptor@alien.top · 2 years ago

For some people “uncensored” means it hasn’t been lobotomized, but for others it means it can write porn.

Useful_Hovercraft169@alien.top · 2 years ago

Pot que no los dos

motodavide@alien.top · 2 years ago

I like Wizard Vicuna Uncensored

Sweet_Protection_163@alien.top · 2 years ago

34B Nous-capybara was the only model I could use reliably for complicated nlp and json output. My go to for any real work. The first, really.

LienniTa@alien.top · 2 years ago

gguf goliath will give you best answers but will be very slow. you can unload like 40 layers to vram and your ram will still be a speed bottleneck, but i think 2 t/s are possible on 2 bit quant.

BlueMetaMind@alien.top · 2 years ago

Best experience I had was with TheBloke/Wizard-Vicuna-30B- Uncensored-GGML

Best 30B llm so far in general. Censorship kill’s capabilities

trollsalot1234@alien.top · 2 years ago

you can probably run TheBloke/Chronoboros-33B-GGUF pretty ok.

howzero@alien.top · 2 years ago

Best is subjective, but the recently released LLAMA2-13B-Psyfighter2 is phenomenal, in my opinion. https://huggingface.co/KoboldAI/LLaMA2-13B-Psyfighter2-GGUF

pepe256@alien.top · 2 years ago

Better than tiebreaker?

BriannaBromell@alien.top · 2 years ago

Im using this and its shockingly great:
https://huggingface.co/TheBloke/Xwin-MLewd-7B-V0.2-GPTQ

Just discovering TheBloke/Xwin-MLewd-13B-v0.2-GPTQ

1dayHappy_1daySad@alien.top · 2 years ago

I’ve used the gguf version of Xwin-MLewd-13b and it’s the smartest 13b I’ve found so far

zumba75@alien.top · 2 years ago

What is the app you’re using it in? I tried the 13b in Ooga Booga and wasn’t able to make it work consistently (goes and replies instead of me after a short while)

BriannaBromell@alien.top · 2 years ago

I just recently wrote my own pure python/chromadb program but before i had great success in oogabooga and this model. I think maybe there is a setting that is overlooked that maybe i enabled in oobabooga or maybe its one of the generation kwargs that just seems to work flawlessly. The model has issues with keeping its self separate from the user so take care in your wording in the system message too.

having seen the model’s tokenizer.default_chat_template that isnt unbelievable, its a real mess with impossible conditions.

My health is keeping me from making a better response but If you’re dead set on using it message me and we’ll work it out together. I like this model the most.

nero10578@alien.top · 2 years ago

Wonder what card you have that’s 20GB?

Herr_Drosselmeyer@alien.top · 2 years ago

What are you looking for?

With a 3090, you can run any 13b model in 8 bit, group size 128, act order true, at decent speed.

Go-tos for the more spicy stuff would be Mythomax and Tie fighter.

shaman-warrior@alien.top · 2 years ago

Do you know if 13b-8bit is better than 70b quantized?

TuuNo_@alien.top · 2 years ago

https://github.com/ggerganov/llama.cpp/pull/1684 Higher parameter should be always better

AbsorbingCrocodile@alien.top · 2 years ago

That’s actually so funny, the 2 times I’ve asked this before, I get downvoted to shit.