@BoshiAI

BoshiAI@alien.top · 2 years ago

Thanks for confirming this. I’ve seen so much praise for these models, yet I’ve experienced no end of problems in trying to get decent, consistent output. A couple of Yi finetunes seem better than others, but there are still too many problems for me to prefer them over others (for RP/chat purposes.)

I’m still hopeful it’s just a matter of time (and a fair amount of trial-and-error) before myself, app developers and model mixers, work out how to get fantastic, consistent out-of-the-box results.

BoshiAI@alien.top · 2 years ago

Potentially dumb but related question:

I know the Mac M* Series chips can use up to ~70% of their universal RAM for “GPU” (VRAM) purposes. The 20GB used to load up a Yi-34B model just about uses all of that up.

So: given I still have maybe 8GB of remainder RAM to work with (assuming I leave 4GB for the system), would I be able to apply a 128K context buffer and have that located in “normal” RAM?

I’m assuming the heavy computational load is performed on the inferencing itself, and the model itself would be loaded in “VRAM” and the GPU side of the chip handles that - but can the context buffer be loaded and work at a decent speed in the remaining RAM? Or does everything - the context buffer and model - both have to use “VRAM” to work at a decent speed?

BoshiAI@alien.top · 2 years ago

We need a monthly summary, at least, but even that feels like far too long given how fast things are evolving lately. One moment, we seem to be agreed MythoMax is the bee’s knees, then suddenly we’ve got Mythalion and a bunch of REMM variants. Suddenly, we’re getting used to Mistral 7Bs giving those 13B models a run for their money, and then Yi-34B 200K and Yi-34B Chat appear out of nowhere. Decent, out-of-the-box RP mixes and fine-tunes of that surely won’t be far behind…

It feels like this has all happened in the past couple of weeks.
Don’t get me wrong, I love it, but I’m dizzy! Excited, but dizzy.