Hi, everyone. Xwin-Math is intended to promote the mathematical reasoning capabilities of LLMs. Now we release the first version, which is a series of Llama 2 SFT models with CoT prompt.

GitHub link: Xwin-LM/Xwin-Math at main · Xwin-LM/Xwin-LM (github.com)

Model link: Xwin-LM (Xwin-LM) (huggingface.co)

Gradio Demo: Gradio (70B model)

Math capability on GSM8K and MATH benchmark

The Xwin-Math-70B-V1.0 model achieves 31.8 pass@1 on MATH benchmark and 87.0 pass@1 on GSM8K benchmark. This performance places it first amongst all open-source CoT models.

The Xwin-Math-7B-V1.0 and Xwin-Math-13B-V1.0 models achieve 66.6 and 76.2 pass@1 on GSM8K benchmark, ranking as top-1 among all LLaMA-2 based 7B and 13B open-source models, respectively.

We also evaluate Xwin-Math on other benchmarks such as SVAMP and MAWPS. Xwin-Math-70B-V1.0 approaches or surpasses the performance of GPT-35-Turbo (8-shot) on most benchmarks.

In addition, it also includes an evaluation toolkit that better converts LaTeX formulas into SymPy objects, enabling more accurate assessment of the mathematical abilities. We found that due to evaluation constraints, the results of GPT-4 were previously underestimated.

More information can be found in our GitHub repo. We SFT on Llama 2 with standard setting, using GPT-4 to augment the training set of MATH and GSM8K to approximately 100K in total. Our paper is still in the progress, so more training details and further results will be updated soon.

Any suggestions or comments greatly welcome! Thanks! =)

  • uti24@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Prompt:

    I have 100 hit points, in 1 minute of fighting with monsters I lose 7 hit points, but also every minute I regenerate 2 hit points. How long I can fight with monsters?

    Ansver:

    To determine how long you can fight with monsters, we need to consider the net loss of hit points per minute. You lose 7 hit points per minute but also regenerate 2 hit points per minute.

    Net loss per minute = Loss per minute - Regeneration per minute<br>Net loss per minute = 7 - 2 = 5 hit points

    Now, we need to find out how many minutes it will take for you to lose all 100 hit points with a net loss of 5 hit points per minute.

    Let’s use the formula:

    Time (in minutes) = Total hit points / Net loss per minute

    Time (in minutes) = 100 / 5 = 20 minutes

    So, you can fight with monsters for 20 minutes before losing all your hit points.

    The answer is: 20.

    Actually, … I am impressed!

    But same time:

    I have 100 hit points, in 1 minute of fighting with monsters I lose 7 hit points, but also every minute I regenerate 2 hit points. I also kill 1 monster every 4 minutes I kill a monster and my current hit points increase by 1. I also need to have 20% of my hit points remain, so I can return to the base. How long I can fight with monsters?

    It can not solve it by a long shot. Tried multiple times. GPT-3.5 solved it flawlessly.

    • leelweenee@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      It can not solve it by a long shot.

      I tried your prompt, and interestingly enough it got the correct answer (16 mins) but the reasoning was very weird, using logs and whatnot

        • uti24@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          rest are noise

          But why? If you increase health restoring parameter it would matter on result.

          Also, why it used logs then, don’t seem to be right.

          • pseudonerv@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            of course not. I meant the LLM only needs to compute based on that equation alone. The rest, meh, it may hallucinate as it loves.

      • uti24@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Well it gave me very weird results on my updated prompt.

        Like it said result is 4 or something also using logs.