• 1 Post
  • 7 Comments
Joined 1 year ago
cake
Cake day: November 9th, 2023

help-circle




  • Makes sense the benchmark results would be surprisingly low for goliath. After playing around with it for a few days, I’ve noticed two glaring issues:

    • it tends to make slight spelling mistakes
    • it hallucinates words They happen rarely, but frequent enough to throw off benchmarks. I’m very positive this can be solved by a quick full finetune over a 100 or so steps, which would align the layers to better work together.

  • The shearing process would likely need to close to 1 billion tokens of data, so I’d guess about a few days on ~24x A100-80G/H100s. And if we get a ~50B model out of it, we’d need to train that on around ~100B tokens, which would need at least 10x H100s for a few weeks. Overall, very expensive.

    And yes, princeton-nlp did a few shears of Llama2 7B/13B. It’s up on their HuggingFace.