drplan@alien.topBtoLocalLLaMA•NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMEnglish
1·
1 year agoPerfect. Next please a chip that can do half the inference speed of an A100 with 15 Watts power.
Perfect. Next please a chip that can do half the inference speed of an A100 with 15 Watts power.
The most interesting part is the focus on biological sequence data. This means that generative AI for synthetic biology is on the policy makers/risk assessors radar, and probably rightly so.
What now? Another couple megawatthours spent on training mostly same datasets in mostly on the same architecture? I mean: I love LLMs and open source, but reinventing the wheel 100 times and spending so much energy for redundant results is somewhat pointless. There should be a community effort to achieve the best and lasting models.