100B, 220B, and 600B models on huggingface!

Illustrious_Sand6784@alien.top · 2 years ago

100B, 220B, and 600B models on huggingface!

FaustBargain@alien.top · 2 years ago

how much ram do you think the 600B would take? I have 512gb and I can fit another 512gb in my box before I run out of slots. I think with 1TB I should be able to run it unquantized because falcon 180b used slightly less than half my ram.

theyreplayingyou@alien.top · 2 years ago

Can you please share a bit more about your setup and experiences?

I’ve been looking to use some of my idle enterprise gear for LLM’s but everyone tells me not to bother. I’ve got a few dual xeon boxes with quad channel DDR4 in 256 & 384GB capacities, NVMe or RAID10 SSDs, 10GBe, etc and I guess (having not yet experienced it) I have a hard time imagining that the equivalent of 120Ghz, 1/2 - 1tb of RAM and 7GB/s disk reads “not being fast enough.” I don’t need instant responses from a sex chatbot, rather I would like to run a model that can help my wife (in the medical field) with work queries, to help my school age kid with math and grammar questions, etc.

Thank you much!

FaustBargain@alien.top · 2 years ago

if you have the ram don’t worry about disk at all. if you have to drop to any kind of disk even if it’s gen 5 ssd you speeds will tank. memory bandwidth matters so much more than compute for LLMs, but it all depends on your needs. there are probably cheaper ways to go about this if you just need something occasionally. maybe runpod or something, but if you need a lot of inference then locally could save you money, but renting a big machine with a100s will always be faster. so will a 7B model do what you need or do you need the accuracy and comprehension of a 70b or one of the new 120b merges? also llama3 is supposed to be out in jan/feb and if it’s significantly better then everything changes again.