- 1 Post
- 5 Comments
No-Link-2778@alien.topBto
LocalLLaMA•Would a merge between Neural Chat 7B v3.1 and OpenHermes-2.5 work?English
1·2 years agoDo you think there is any scientific basis for the merge? This is medieval alchemy again.
No-Link-2778@alien.topBto
LocalLLaMA•🚀 Launching SauerkrautLM-7b-HerO: A New Era in German Language Modeling!English
1·2 years agoDo you think there is any scientific basis for the merge? This is medieval alchemy again. And I hope you can make some data public that you recognize as a native speaker, which would be good for public research, rather than merging without theoretical basis in order to improve “score performance”.
It’s data is public, but OpenHermes-2.5 dataset is gated and not accessible.
No-Link-2778@alien.topBto
LocalLLaMA•Yi-34B vs Yi-34B-200K on sequences <32K and <4KEnglish
1·2 years agoI have trained book3 for 1 day on a number of GPUs on the 200k, 34B & 6B, it is totally garbage.
It is not a BASE model at ALL. It even knows itself as GPT sometimes. It was a SFT model on format of benchmarks.
Try it before you do silly things, you would not find it on SFT immediately, but sooner or later.
Will you consider the non-DPO one? There seems to be a downgrade on NLP tasks compared with the original SFT model.