So RWKV 7b v5 is 60% trained now, saw that multilingual parts are better than mistral now, and the english capabilities are close to mistral, except for hellaswag and arc, where its a little behind. all the benchmarks are on rwkv discor, and you can google the pro/cons of rwkv, though most of them are v4.
Thoughts?
Um The dataset is opensource, its all public HF datasets
“World = Some_Pile + Some_SlimPajama + Some_StarCoder + Some_OSCAR + All_Wikipedia + All_ChatGPT_Data_I_can_find”
“some” as in customized.