What is considered the best uncensored LLM right now?

Hyddro26@alien.top · 3 years ago

What is considered the best uncensored LLM right now?

Brave-Decision-1944@alien.top · 3 years ago

People, one more thing, in case of LLM, you can use simulationsly multiple GPUs, and also include RAM (and also use SSDs as ram, boosted with raid 0) and CPU, all of that at once, splitting the load.

So if your GPU is 24GB you are not limited to that in this case.

In practice: I used https://github.com/oobabooga/text-generation-webui

Copied Augmental-Unholy-13B-GGUF folder to models folder. In UI I just selected load model, it automatically switched to llama.cpp.

But there is setting n-gpu-layers set to 0 which is wrong, in case of this model I set 45-55. The results was loading and using my second GPU (NVIDIA 1050ti), while no SLI, primary is 3060, they where running both loaded full. n_ctx setting is “load of CPU”, got to drop to ~2300 for my CPU is older. Now it ran pretty much fast, up to Q4-KM. Most slowdown was caused while 100%SSD load, that’s why I think of RAID 0 (which would be ideal because it was one big chunk at top speed), but didn’t brought that another physical drive jet.

Batch 512, thread’s 8, threads batch 8, these settings where pure quess but it worked, and got to get back to it to understand properly. This subinformation may help if you want to try that on old AMD faking to be FX 8370 8core, and 14GB DDR3 RAM acting as 10GB.

YuriWerewolf@alien.top · 3 years ago

How did you set settings for memory sharing (layers) between gpus? I have 2 gpus: 3060Ti and 3060 and it seems like it tries to load everything on the first one and goes out of memory.

Brave-Decision-1944@alien.top · 3 years ago

Like this to be exact

https://preview.redd.it/bj2znub0r91c1.png?width=720&format=pjpg&auto=webp&s=c56dfb9dafd65a7c3f8ea73624118ff9c75de47d