- 1 Post
- 2 Comments
Joined 2 years ago
Cake day: November 27th, 2023
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.
seanpuppy@alien.topBto LocalLLaMA•What kind of specs to run local llm and serve to say up to 20-50 usersEnglish1·2 years agoIt depends a lot on the details tbh. Do they share one model? Do they each use a different lora? If its the latter theres some cool recent research on efficiently hosting many loras on one machine
I saw an interesting article somewhere that showed you can be a lot more memory efficient doing inference with Rust, since you dont have several GBs of python dependencies.