I use in both cases q4_K_M
Are you talking about base yi-34B or a fine-tuned one? Base model will be hard to use but will score pretty high. Benchmarks are generally written with completion in mind, so they work really well on base models and instruct tuning may make it much easier to work with but not necessarily score higher on benchmarks.
Does anyone have a setup that works with a Yi34b model on 12GB vram and 32GB ram? I have tried using GGUF but I always end up with the <s> answer. I’m on Oobabooga and any specific settings to overcome this hurdle with GGUF would be greatly appreciated! I</s>
I had very bad experiences using all the Yi models until recently. Going to chalk it up as user error on my part. LoneStriker_Capybara-Tess-Yi-34B-200K-DARE-Ties-4.0bpw-h6-exl2 is really good. I made sure to have all the right settings.
While the benchmarks then to be cheated, especially by small models, I honestly think something is wrong with how you run it.
Yi-34B trades blows with Lllama 2 70B from my personal tests, making it do novel tasks invented by me, not the gamed benchmarks.
ALL 7B models are like putting a 7 year old vs an renowned professor when they are compared to 34B and 70B.
Same experience here. I got excellent results from quantized models of Intel-Neural-7B and Mistral-7B but bad results with quantized model of Yi-34B.
I’m not sure what the point of Neural-7B is, given that it’s super censored corporate safety bot. If that’s what people want they might as well just use ChatGPT, which is faster and better otherwise.
neural-chat from Intel is not censored! Just use a good system prompt
Privacy and cost Also no, 7B is as fast or faster than ChatGPT depending on ChatGPT load.
I’m curious what results you’re seeing from the Yi models. I’ve been playing around with LoneStriker_Nous-Capybara-34B-5.0bpw-h6-exl2 and more recently LoneStriker_Capybara-Tess-Yi-34B-200K-DARE-Ties-5.0bpw-h6-exl2 and I’m finding them fairly good with the right settings. I found the Yi 34B models almost unusable due to repetition issues until I tried settings recommended in this discussion:
https://www.reddit.com/r/LocalLLaMA/comments/182iuj4/yi34b_models_repetition_issues/
I’ve found it much better since.
I tried out one of the neural models and found it couldn’t keep track of details at all. I wonder if my setting weren’t very good or something. I would have been using a EXL2 or GPTQ version though.
I found the Yi 34B models almost unusable due to repetition issues until I tried settings recommended in this discussion:
I have the same issue with LoneStriker_Nous-Capybara-34B-5.0bpw-h6-exl2. Whole previous messages will often get shoved into the response. I basically gave up and went back to Mistral-OpenHermes.
To stop any repetition. you could try to add a stop token in model as ‘### Human’ it works well for me
Capybara doesn’t use Alpaca format, so that wouldn’t do anything. Regardless, it’s not that type of repetition. It’s not speaking for the user, it’s literally just copy/pasting part of the conversation into the answer.
I’ve had the same experiences with the Yi finetunes. I tried them on single-turn generations and they were very promising. However, starting with one from scratch I was having a ton of repetition and looping. Some models need a very tight set of parameters to get them to perform well, whereas other ones will function will under almost any sane set of guidelines. I’m thinking Yi leans more towards the former, which will have users thinking they are inferior to simpler, but more flexible models.
It’s a source. But rarely synthetic benchmarks give you the whole picture. Plus those test sets are in the public, so there is some incentive for some people to game the system (and even without that those data sets most likely are already in the training data).
I’ve had the same experience. Are you using GGUF? I do, and I’ve heard that Yi may suffer from GGUF. So EXL2 might be better… I need to try it and see.
Reliable? It never was informative to a certain extent yes
You are hallucinating?
My private finetunes are about text rewriting - input text paragraph - rewrite it in a certain style.
No 7b finetuned model can grasp the idea of submitted text in entirety, tried maybe 100 different runs. It would make a common mistake of “someone” who just scan the text quickly while also watching youtube on a phone, failing to comprehend who is who or what the paragraph is about.
13b with the same finetuning does much better - it would comprehend the relations. For example if two people are speaking, it can keep track who is who, even without mentioning it in the text.
33b - gets even further - sometimes surprise with the way it understand the text. And so the rewritten text is a mirror image of the input, just with different style
7b are impressive if you want a small local LLM to give you answers on questions, but that’s probably the limit. If you want an assistant that can also do other things, then it falls short, because your instructions are not necessary understood fully.
Most of the benchmarks seem to measure regurgitation of factual knowledge, which IMO everyone should accept as a misguided idea for a task, from in-weights learning, instead of testing in-context learning, which I would argue was the goal of LLM training. I’d say they are probably harmful to the cause of improving future LLMs
I agree, and The Leaderboard’s newly added DROP metric is a step in the right direction.
You are hallucinating?
90% of the time a bigger model is “worse” because…
A) I messed up the prompt format
B) (For roleplaying) Smaller models seem more creative, because they’re less consistent. But after some messages, the missing consistency makes them really bad.
yeah, no lol
No 7B model is going to beat a 34B model anytime soon.