I have a stealth 15m laptop that has 16 gig of ram with a 3060 with 6vrams. Can this run 13b models decently well? Pretty new to llm stuff and so far I can only make it gen around 2-3 token a second and feel like that’s pretty slow. Is there anyway I can bump that to 5+ token per second? Or is 2-3 token per second the limit of my laptop?
You must log in or register to comment.
- If you download GGUF models from “thebloke” you can read on the models card page how much RAM is required for the specific model without offloading to the GPU. - I have included a screenshot as an example of a 13b model. -  -  

