So Mistral-7b is a pretty impressive 7B param model … but why is it so capable? Do we have any insights into its dataset? Was it trained very far beyond the scaling limit? Any attempts at open reproductions or merges to scale up # of params?

  • qubedView@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Do people find that it holds up in use? Or are we mostly going on benchmarks? I’m skeptical of benchmarks, and a highly performant 7B model would be of great use.