I’ve playing with a lot of models around 7B but I’m now prototyping something that would be fine with a 1B model I think, but there’s just Phi-1.5 that I’ve seen of this size, and I haven’t seen a way to run it efficiently so far. llama.cpp has still not implemented it for instance.
Anyone has an idea of what to use?
Tiny llama is 1.1b ?
I mean yeah but it’s not done training AFAIK, and not fine-tuned either
Anyone saw this https://burn.dev/demo ?
You can try MLC-LLM (https://llm.mlc.ai/), it has tools for inference of quantized models on the web
Deepseek-Coder has a 1B model I believe that’s outperforming 13B models— I’ll check back once I find a link
Edit: found it https://evalplus.github.io/leaderboard.html
Thanks! But I’m not looking for one that does coding, more one that’s good at detecting fallacies and reasoning. Phi-1.5 seems a better fit for that
I would still give it a try— it’s misleading to think these coding models are only good at that, being good at coding actually has shown to improve its scores across multiple benchmarks.
RWKV 1.5B, its Sota for its size, outperforms tinyLlama, and uses no extra vram for fitting its whole ctx len in browser.