Warning, this is still work in progress.
https://huggingface.co/piotr-ai/polanka-7b-v0.1
First version of 7b Polish LLM finetuned using custom data in Polish language.
As a base model I used uncensored https://huggingface.co/ehartford/dolphin-2.1-mistral-7b so Dolphin “personality” should also be there.
It was trained using 4K context in ChatML format. All done on a single 4090 for multiple days.
I hope we can get quantized gguf soon from the legendary TheBloke
I have tested several small 7B models for speaking Polish and it seems to me that currently openchat_3.5.Q4_K_S.gguf
is probably the best.
Of course this was not a large-scale study, so it is not necessarily 100% true ;)
And I look forward to the final release 👍Thanks! For the record, that version is very under-trained. Today I started to train on much bigger dataset (50k entries) that is mostly built from the wikipedia.