Ilya from OpenAI have published a paper (2020) about Q* : a GPT-f model have capabilities in understand and resolve Math, Automated Theorem Proving.
https://arxiv.org/abs/2009.03393
When AI model can understand and really doing Math, that a critical jump.
Strange, I thought they would naturally be rewarding the process, by rewarding each word that’s generated by the sequence to sequence model, rather than the final words, for example. Maybe they over-optimised and skipped training on all output.