minus-squareAdditional_Code@alien.topBtoLocalLLaMA•A100 inference is much slower than expected with small batch sizelinkfedilinkEnglisharrow-up1·1 year agoIn my personal experience, inference speed on RTX 3090 > A100 > A6000. linkfedilink
In my personal experience, inference speed on RTX 3090 > A100 > A6000.