We’ve put together an article using some guesstimates of what it would be like for an enterprise to deploy LLM’s on prem.
https://bionic-gpt.com/blog/llm-hardware/
In short, I’m estimating $20,000 in hardware costs per 1000 users, minimum.
I’d be grateful if people could give me some feedback on the numbers and whether my assumptions look realistic.
Thanks


It’s extremely overpriced. With INT4 llama.cpp does even crazier numbers. A system with 4090s can be made for $2500 in India & cheaper elsewhere for sure.
Didn’t Nvidia ban the use of consumer grade cards for professional uses? You will need to use A100s and whatnot for a datacenter