Your wait and see approach is probably wise. The newly released GH200 chip leapfrogs the H100 by a considerable margin, which was already smoking the A100.
On the consumer side, there does not seem to be a high demand to run local LLM. However, I used a 7b model with GPT4All on my ultrabook from 2014 which has a low-tier intel 6th gen with 16gb ram and was getting about 2.5 tokens/second. It was super slow but just shows what would be possible with some optimizations on consumer hardware.
If you’re willing to spend $10k to run an esoteric 110b model, it might be worthwhile to go for the capability to train them in the first place (even if perhaps very slowly). Or, consider a mac with large amounts of memory that’s built into the soc (unified memory) which would likely run models at an acceptable rate with some optimizations. Of course, if blistering performance isn’t necessary.
Otherwise, patience will likely have some good results in the context of a solid model which works on consumer-grade components. The space seems keen on allowing general users and enabling alternatives to transmitting data to some random server elsewhere. Opinion.
Your wait and see approach is probably wise. The newly released GH200 chip leapfrogs the H100 by a considerable margin, which was already smoking the A100.
On the consumer side, there does not seem to be a high demand to run local LLM. However, I used a 7b model with GPT4All on my ultrabook from 2014 which has a low-tier intel 6th gen with 16gb ram and was getting about 2.5 tokens/second. It was super slow but just shows what would be possible with some optimizations on consumer hardware.
If you’re willing to spend $10k to run an esoteric 110b model, it might be worthwhile to go for the capability to train them in the first place (even if perhaps very slowly). Or, consider a mac with large amounts of memory that’s built into the soc (unified memory) which would likely run models at an acceptable rate with some optimizations. Of course, if blistering performance isn’t necessary.
Otherwise, patience will likely have some good results in the context of a solid model which works on consumer-grade components. The space seems keen on allowing general users and enabling alternatives to transmitting data to some random server elsewhere. Opinion.