Deepseek llm 67b Chat & Base

No-Link-2778@alien.top · 1 year ago

Deepseek llm 67b Chat & Base

No-Link-2778@alien.top · 1 year ago

Will you consider the non-DPO one? There seems to be a downgrade on NLP tasks compared with the original SFT model.

No-Link-2778@alien.top · 1 year ago

Do you think there is any scientific basis for the merge? This is medieval alchemy again.

No-Link-2778@alien.top · 1 year ago

Do you think there is any scientific basis for the merge? This is medieval alchemy again. And I hope you can make some data public that you recognize as a native speaker, which would be good for public research, rather than merging without theoretical basis in order to improve “score performance”.

No-Link-2778@alien.top · 1 year ago

It’s data is public, but OpenHermes-2.5 dataset is gated and not accessible.

No-Link-2778@alien.top · 1 year ago

I have trained book3 for 1 day on a number of GPUs on the 200k, 34B & 6B, it is totally garbage.
It is not a BASE model at ALL. It even knows itself as GPT sometimes. It was a SFT model on format of benchmarks.
Try it before you do silly things, you would not find it on SFT immediately, but sooner or later.

Deepseek llm 67b Chat &amp; Base

Deepseek llm 67b Chat &amp; Base

Deepseek llm 67b Chat & Base

Deepseek llm 67b Chat & Base