How come llama2 70B is so much worse than many other code-llama 34B?
I’m not talking specifically for coding questions but the 70B seems utterly stupid… repeats nonsense patterns, starts talking of unrelated stuff and sometimes get stuck in a loop of repeating the same word. Seems utter garbage and I downloaded the official model from the meta HF.
Has anyone experienced the same? Am I doing something wrong with the 70B model?
Snowflake has a very nice comparison of the two:
Fine-Tuning Improves the Performance of Meta’s Code Llama on SQL Code Generation
The answer is you need more fine tuning.
Did you forget to unset the rope settings?
Codellama requires different rope than regular llama.
Also check your sampler settings.
No I didn’t even know rope was a thing, I’m reading about it now… if you have any tl;dr please post it, this stuff seems pretty complicated.
I was loading the model with a llama.cpp invocation, didn’t know about rope. What would change if I left the default values on?
worked great for me