I have a cluster of 4 A100 GPUs (4x80GB) and want to run meta-llama/Llama-2-70b-hf. I’m a beginner and need some guidance.
- Need a script to run the model.
- Is 4xA100 enough to run the model ? or its more than required?
Need the model for inference only.
Apparently TensorRT-LLM is fairly good: https://twitter.com/abacaj/status/1722008290324807914