On my Galaxy S21 phone, I can run only 3B models with acceptable speed (CPU-only, 4-bit quantisation, with llama.cpp, on termux).
What is the ‘best’ 3B model currently for instruction following (question answering etc.) ?
Currently, I am used orca-mini-3B.See https://www.reddit.com/r/LocalLLaMA/comments/14ibzau/orcamini13b_orcamini7b_orcamini3b/
But I read on this forum that ‘Marx 3B’ model and ‘MambaGPT’ are also seen as good 3B models.See https://www.reddit.com/r/LocalLLaMA/comments/17f1gcu/i_released_marx_3b_v3and https://huggingface.co/CobraMamba/mamba-gpt-3b-v4
Should I switch to these models or stay with orca-mini-3B ?Unfortunately, currently it seems there is no Mistral-based 3B model.
Currently the best 3B LLM in the Open LLM Leaderboard is GeneZC/MiniChat-3B.