Yet another 120b. Trained on limarp.

a_beautiful_rhind@alien.top · 1 year ago

Good luck. Centrism is not allowed. You would have to skip the last decade of internet data. Social engineering works for both people and language models much the same.

a_beautiful_rhind@alien.top · 1 year ago

From the issue about this in the exllamav2 repo, quip was using more memory and slower than exl. How much context can you fit?

a_beautiful_rhind@alien.top · 1 year ago

I’m not getting a super huge jump with the bigger models yet. Just a mild bump. I got a P100 to load the low 100s and have exllama work. That’s 64g of FP16 using vram.

For bigger I can use FP32 and put back the 2 more P40s. That’s 120g of vram. Also 6 vidya cards :P

It required building for this type of system from the start. I’m not made of money either, I just upgrade it over time.

a_beautiful_rhind@alien.top · 1 year ago

Yet another 120b. Trained on limarp.

a_beautiful_rhind@alien.top · 1 year ago

It really is christmas.

a_beautiful_rhind@alien.top · 1 year ago

I got a P100 for like $150 to see how well it will work with exllama + 3090s and if it is any faster at SD.

These guys are all gone already.

a_beautiful_rhind@alien.top · 1 year ago

Would be cool to see this in a 34b and 70b.

a_beautiful_rhind@alien.top · 1 year ago

Aren’t there people selling such services to companies here? Implementing RAG, etc.

a_beautiful_rhind@alien.top · 1 year ago

Heh, 72b with 32k and GQA seems reasonable. Will make for interesting tunes if it’s not super restricted.

a_beautiful_rhind@alien.top · 1 year ago

That’s a good sign if anything.

a_beautiful_rhind@alien.top · 1 year ago

one is not enough

a_beautiful_rhind@alien.top · 1 year ago

Does it give refusals on base? 67B sounds like full foundation train.

a_beautiful_rhind@alien.top · 1 year ago

Something is wrong with your environment. even P40s give more than that.

Other option is you don’t get enough tokens to get proper t/s speed. What was the total inference time?

a_beautiful_rhind@alien.top · 1 year ago

Welcome to the beginning of the death of shared reality. It’s on the chopping block after objective truth. The latter is almost done.

a_beautiful_rhind@alien.top · 1 year ago

GS and SG merge different models.

a_beautiful_rhind@alien.top · 1 year ago

I just got a P100 for like $150, going to test it out and see how it does with its FP16 vs P40 for SD and exllama overflow.

4060 is faster but its multiple times as expensive. For your sole GPU you really need 24gb+. The AMD are becoming somewhat competitive but still have some hassle and slowness.

CPU is going to give you 3t/s, its not really anywhere near, even with the best procs. Sure get it for other things in the system, but don’t expect it to help much with ML. I guess newer will get you faster ram but it’s not enough.

a_beautiful_rhind@alien.top · 1 year ago

Wonder how L1 65b would do with L2 70b.

a_beautiful_rhind@alien.top · 1 year ago

Pretty cool hack. Beats CPU inference at those speeds for sure.

a_beautiful_rhind@alien.top · 1 year ago

Maxwell is pretty dead.

a_beautiful_rhind@alien.top · 1 year ago

P40, 3090, those are your “affordable” 24gb GPU unless you want to go AMD or have enough to make 3x16gb or something.

a_beautiful_rhind@alien.top · 1 year ago

Let the merging begin!