What is everyone’s experiences so far with DPO trained versions of their favorite models? Been messing around with different models and my two new favorite models are actually just the DPO versions of my previous favorite models (causalLM 14b and openhermes 2.5 7b). Links below for the models in question.

CausalLM 14B-DPO-alpha - GGUF: https://huggingface.co/tastypear/CausalLM-14B-DPO-alpha-GGUF

NeuralHermes 2.5 Mistral 7B - GGUF: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF

The former runs at 30 t/s for me with koboldcpp-rocm on a 6900 XT, and the latter at 15 t/s, both at Q6K. I don’t have a favorite between these two models, they seem to be better at different things and trade blows in all the logic + creative writing tasks I’ve tested them in, despite causalLM being a larger model. I’m looking forward to seeing what nousresearch/teknium and CausalLM are bringing next.

  • fediverser@alien.top
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This post is an automated archive from a submission made on /r/LocalLLaMA, powered by Fediverser software running on alien.top. Responses to this submission will not be seen by the original author until they claim ownership of their alien.top account. Please consider reaching out to them let them know about this post and help them migrate to Lemmy.

    Lemmy users: you are still very much encouraged to participate in the discussion. There are still many other subscribers on !localllama@poweruser.forum that can benefit from your contribution and join in the conversation.

    Reddit users: you can also join the fediverse right away by getting by visiting https://portal.alien.top. If you are looking for a Reddit alternative made for and by an independent community, check out Fediverser.

  • 1monster90@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Out of topic here but is it just me or open hermes somehow speaks in russian unprovoked frequently?

    • VertexMachine@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Interesting. I’m using oobabooga and that never happened to me. I actually don’t recall it ever outputting anything but English…

    • xadiant@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      High repetition penalty? One model I merged suddenly started speaking Spanish in one summarisation task lol

    • ttkciar@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      If you are using llama.cpp, you might want to give it a grammar which forces ASCII output.

  • sebo3d@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I tried DPOpenHermes from TheBloke(Q6 GGUF version) and i love it but i think there’s an issue with an EOS token as for some reason the model just keep generating text way past where it should logically stop. I see myself using it more but i hope there will be an update that adresses the EOS issue.

    • Feztopia@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      There is already an updated version that is supposed to fix that (with additional training on top which lowered it’s overall capabilities apparently). I don’t know if TheBloke has it already. But I see the first set of dpo models as test runs the next ones should fix the issues (except for NeuralHermes, maybe it’s already good, I didn’t hear much feedback about it).

  • SomeOddCodeGuy@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Wow, I’ve never seen an fp16 gguf before. Holy crap, I wish there were more of those out there; I’d love to get my hands on some for 70b models or the like. I didn’t realize unquantized gguf was an option