@uti24

uti24@alien.top · 1 year ago

Maybe you need to update ooba, but I can confirm, today I loaded and played a bit with deepseek 67b gguf 4_K_M on windows, run on CPU.

uti24@alien.top · 1 year ago

Seems I am doing something wrong with this one.

I got abismal results with 4_K_M: it had silly grammatical errors and typos, it also did not stick to prompt, so I don’t know.

uti24@alien.top · 1 year ago

For not biggest models are the best, so there is no best model for CPU, there is best model you are ready and willing to wait an answer.

Like Goliath-120B is great, and I am using it on i5-12400, having 0.4 tokens/second and I don’t want anything less now.

uti24@alien.top · 1 year ago

SSD is at 100% reading 1~2gb/s

If your SSD swapping then model does not fit into RAM.

Use smaller quant, like 4_K_M from your own link.

uti24@alien.top · 1 year ago

That is pretty good! What quant do you use?

Have you tried Goliath-120B, how fast it runs? It might be even better that a Falcon-180B, so might be worth trying.

uti24@alien.top · 1 year ago

Sounds like you run it on CPU. If you using oobabooga you have to explicitly set how many layers you offload to GPU and by default everything runs on CPU (at least gguf models)

uti24@alien.top · 1 year ago

By the way, is there a demo exists or something?

I guess they might make Grok open source if it is not very good and just comparable to open source models and leave it for themselves if it is better.

uti24@alien.top · 1 year ago

rest are noise

But why? If you increase health restoring parameter it would matter on result.

Also, why it used logs then, don’t seem to be right.

uti24@alien.top · 1 year ago

Well it gave me very weird results on my updated prompt.

Like it said result is 4 or something also using logs.

uti24@alien.top · 1 year ago

Prompt:

I have 100 hit points, in 1 minute of fighting with monsters I lose 7 hit points, but also every minute I regenerate 2 hit points. How long I can fight with monsters?

Ansver:

To determine how long you can fight with monsters, we need to consider the net loss of hit points per minute. You lose 7 hit points per minute but also regenerate 2 hit points per minute.

Net loss per minute = Loss per minute - Regeneration per minute<br>Net loss per minute = 7 - 2 = 5 hit points

Now, we need to find out how many minutes it will take for you to lose all 100 hit points with a net loss of 5 hit points per minute.

Let’s use the formula:

Time (in minutes) = Total hit points / Net loss per minute

Time (in minutes) = 100 / 5 = 20 minutes

So, you can fight with monsters for 20 minutes before losing all your hit points.

The answer is: 20.

Actually, … I am impressed!

But same time:

I have 100 hit points, in 1 minute of fighting with monsters I lose 7 hit points, but also every minute I regenerate 2 hit points. I also kill 1 monster every 4 minutes I kill a monster and my current hit points increase by 1. I also need to have 20% of my hit points remain, so I can return to the base. How long I can fight with monsters?

It can not solve it by a long shot. Tried multiple times. GPT-3.5 solved it flawlessly.

uti24@alien.top · 1 year ago

Oh we definably need GGUF variant of this model, I love Goliat-120B (I event think it might be better that Falcon-180B) and would love to run this model.

uti24@alien.top · 1 year ago

Is this model any good for SFW stuff?

Every uncensored llm I tried worked fine with SFW stuff.

If you are talking about story telling they might be even better that SFW models. And I also never seen NSFW/uncensored models to write NSFW stuff unless explicitly asked to do so.

uti24@alien.top · 1 year ago

If model fits completely inside 12Gb than it would work faster on a desktop, if model not fits into 12Gb but fits fully in 16Gb then you have a good chances it would run faster on a laptop with 16Gb GPU.

uti24@alien.top · 1 year ago

I had a high hopes for Yi-34B chat, but when I tried it I saw it is not very good.

70B models are better (well of course), but I think even some 20B models are better.

uti24@alien.top · 1 year ago

I used oobabooga_windows\text-generation-webui

uti24@alien.top · 1 year ago

I think I tested it up to 500 tokens or so.

uti24@alien.top · 1 year ago

Running full Falcon-180B under budget constraint

Oh nonono, you doing it wrong ;) just kidding. Next numbers for reference of what one can have on a budget system without multiple hi end GPU-s.

i5-12400f + 128Gb DDR4 + some layers offloaded to 3060Ti = 0.35 token/second on Falcon-180B 4_K_M

uti24@alien.top · 1 year ago

Seems this model has a problem and not loading.

uti24@alien.top · 1 year ago

Tried gguf format of this model from huggingface and they just wont load.

uti24@alien.top · 1 year ago

Interesting, everyone suggesting 7B models, but you can run much better models using not only your GPU memory, so I would highly recommend mxlewd-l2-20b its very smart, its fantastic for writing scenes and such.