Announcing Llama-rephraser: 13B models reaching GPT-4 performance in major benchmarks (MMLU/GSK-8K/HumanEval)!
To ensure result validity, we followed Open...
yeah people praising 7b and 13 b models here and there, but…they just hallucinate! Then 120b goliath, no matter how terrible its initial idea was, is just really good in normal conversations. Im trying to love giga praised open hermes 2.5 and other mistral finetunes, but they are just better next-token-predictors, unlike larger models which are actually able to reason.
yeah people praising 7b and 13 b models here and there, but…they just hallucinate! Then 120b goliath, no matter how terrible its initial idea was, is just really good in normal conversations. Im trying to love giga praised open hermes 2.5 and other mistral finetunes, but they are just better next-token-predictors, unlike larger models which are actually able to reason.