You need a load balancer of some sort but an A6000 would be a good start. 15-20 tps as a single user.
In vanilla form, Llama 2 may do silly stuff. Instructs, tuning, etc. will decrease the likelihood.
If you are taking something to prod, I’d advise picking up a consultant to work with you.
It just sucks because the sweet spot is 48GB but a single card is 3k usd at least.
At 1k you’ll be stuck at 24GB for a single card.