rihard7854@alien.topB to LocalLLaMAEnglish · 1 year agoNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comexternal-linkmessage-square23fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comrihard7854@alien.topB to LocalLLaMAEnglish · 1 year agomessage-square23fedilink
minus-squarealiencaocao@alien.topBlinkfedilinkEnglisharrow-up1·1 year agoBatchsize 1024 though…not for personal use case
minus-squareHerr_Drosselmeyer@alien.topBlinkfedilinkEnglisharrow-up1·1 year agoObviously. There aren’t many people in the world with 50k burning a hole in their pockets and of those, even fewer are nerdy enough to want to set up their own AI server in their basement just for themselves to tinker with.
Batchsize 1024 though…not for personal use case
Obviously. There aren’t many people in the world with 50k burning a hole in their pockets and of those, even fewer are nerdy enough to want to set up their own AI server in their basement just for themselves to tinker with.