Seems I am doing something wrong with this one.
I got abismal results with 4_K_M: it had silly grammatical errors and typos, it also did not stick to prompt, so I don’t know.
Seems I am doing something wrong with this one.
I got abismal results with 4_K_M: it had silly grammatical errors and typos, it also did not stick to prompt, so I don’t know.
For not biggest models are the best, so there is no best model for CPU, there is best model you are ready and willing to wait an answer.
Like Goliath-120B is great, and I am using it on i5-12400, having 0.4 tokens/second and I don’t want anything less now.
SSD is at 100% reading 1~2gb/s
If your SSD swapping then model does not fit into RAM.
Use smaller quant, like 4_K_M from your own link.
That is pretty good! What quant do you use?
Have you tried Goliath-120B, how fast it runs? It might be even better that a Falcon-180B, so might be worth trying.
Sounds like you run it on CPU. If you using oobabooga you have to explicitly set how many layers you offload to GPU and by default everything runs on CPU (at least gguf models)
By the way, is there a demo exists or something?
I guess they might make Grok open source if it is not very good and just comparable to open source models and leave it for themselves if it is better.
rest are noise
But why? If you increase health restoring parameter it would matter on result.
Also, why it used logs then, don’t seem to be right.
Well it gave me very weird results on my updated prompt.
Like it said result is 4 or something also using logs.
Prompt:
I have 100 hit points, in 1 minute of fighting with monsters I lose 7 hit points, but also every minute I regenerate 2 hit points. How long I can fight with monsters?
Ansver:
To determine how long you can fight with monsters, we need to consider the net loss of hit points per minute. You lose 7 hit points per minute but also regenerate 2 hit points per minute.
Net loss per minute = Loss per minute - Regeneration per minute<br>Net loss per minute = 7 - 2 = 5 hit points
Now, we need to find out how many minutes it will take for you to lose all 100 hit points with a net loss of 5 hit points per minute.
Let’s use the formula:
Time (in minutes) = Total hit points / Net loss per minute
Time (in minutes) = 100 / 5 = 20 minutes
So, you can fight with monsters for 20 minutes before losing all your hit points.
The answer is: 20.
Actually, … I am impressed!
But same time:
I have 100 hit points, in 1 minute of fighting with monsters I lose 7 hit points, but also every minute I regenerate 2 hit points. I also kill 1 monster every 4 minutes I kill a monster and my current hit points increase by 1. I also need to have 20% of my hit points remain, so I can return to the base. How long I can fight with monsters?
It can not solve it by a long shot. Tried multiple times. GPT-3.5 solved it flawlessly.
Oh we definably need GGUF variant of this model, I love Goliat-120B (I event think it might be better that Falcon-180B) and would love to run this model.
Is this model any good for SFW stuff?
Every uncensored llm I tried worked fine with SFW stuff.
If you are talking about story telling they might be even better that SFW models. And I also never seen NSFW/uncensored models to write NSFW stuff unless explicitly asked to do so.
If model fits completely inside 12Gb than it would work faster on a desktop, if model not fits into 12Gb but fits fully in 16Gb then you have a good chances it would run faster on a laptop with 16Gb GPU.
I had a high hopes for Yi-34B chat, but when I tried it I saw it is not very good.
70B models are better (well of course), but I think even some 20B models are better.
I used oobabooga_windows\text-generation-webui
I think I tested it up to 500 tokens or so.
Running full Falcon-180B under budget constraint
Oh nonono, you doing it wrong ;) just kidding. Next numbers for reference of what one can have on a budget system without multiple hi end GPU-s.
i5-12400f + 128Gb DDR4 + some layers offloaded to 3060Ti = 0.35 token/second on Falcon-180B 4_K_M
Seems this model has a problem and not loading.
Tried gguf format of this model from huggingface and they just wont load.
Interesting, everyone suggesting 7B models, but you can run much better models using not only your GPU memory, so I would highly recommend mxlewd-l2-20b its very smart, its fantastic for writing scenes and such.
Maybe you need to update ooba, but I can confirm, today I loaded and played a bit with deepseek 67b gguf 4_K_M on windows, run on CPU.