minus-squarealiencaocao@alien.topBtoLocalLLaMA•NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMlinkfedilinkEnglisharrow-up1·1 year agoBatchsize 1024 though…not for personal use case linkfedilink
Batchsize 1024 though…not for personal use case