I used the colab template from this post: https://maximelabonne.substack.com/p/fine-tune-your-own-llama-2-model-in-a-colab-notebook-df9823a04a32
https://colab.research.google.com/drive/1PEQyJO1-f6j0S_XJ8DV50NkpzasXkrzd?usp=sharing
Specifically because it could be run on the free tier. But that’s not possible for any llama2 models, just some.
I will. I also like 13b models. They seem like the perfect balance for us gpu starved people. But I’d rather fail some on 7b models first, since it’s quicker to iterate on them.
I’m working on something similar, but different: https://github.com/neph1/LlamaTale
It will never allow the kind of freedom that a pure storytelling model will give you. But the goal is to provide a more stable experience. I’ve made a number of posts about it here (and the next one is due, soon), but here’s the first:
https://www.reddit.com/r/LocalLLaMA/comments/152w71n/mud_llm_for_a_stronger_roleplaying_experience/
It runs fairly well with a 7b, https://huggingface.co/TokenBender/llama2-7b-chat-hf-codeCherryPop-qLoRA-merged being the one I play with, mostly.