Don’t know OP and below is not aimed at him. But most people call stuff ‘unbiased’ if it’s aligned with their own biases. “Outsmarting” your own brain and self-awareness on meta level is really hard.
- 0 Posts
- 10 Comments
VertexMachine@alien.topBto
LocalLLaMA•Amazon Introduces Q, an A.I. Chatbot for CompaniesEnglish
1·2 years agoI think they have more than 700M customers.
I think they wouldn’t have a problem to negotiate custom license.
VertexMachine@alien.topBto
LocalLLaMA•Evaluate, monitor, and safeguard your LLM-based appsEnglish
1·2 years agoHa! Just as I started writing my own thing for that. Will def take a look! :)
VertexMachine@alien.topBto
LocalLLaMA•100B, 220B, and 600B models on huggingface!English
1·2 years agoI doubt there’s any model there.
VertexMachine@alien.topBto
LocalLLaMA•100B, 220B, and 600B models on huggingface!English
1·2 years agoI doubt there is any model really… follow the trail, you’ll end up at a company founded by single person from India (who is founder of another company with a single app for collaborative drawing)… that at least doesn’t have any employees on LinkedIn…
And the founder looks like a relatively young person that most likely wouldn’t be even able to gather the required funding to have enough GPU compute for making model that’s better than gpt4 (or know how). I think that’s just a front for him trying to get some hype or funding.
VertexMachine@alien.topBto
LocalLLaMA•Models Megathread #2 - What models are you currently using?English
1·2 years ago- LoneStriker_OpenHermes-2-Mistral-7B-8.0bpw-h6-exl2 - my generic goto
- LoneStriker_airoboros-l2-70b-3.1-2.4bpw-h6-exl2 - this one (and the whole family) is great for creative and precise tasks. If they don’t work I jump to wizardlm or vicuna.
- oobabooga_CodeBooga-34B-v0.1-EXL2-4.250b and phind-codellama-34b-v2.Q4_K_M.gguf are great for coding. I haven’t decided which one is better yet.
VertexMachine@alien.topBto
LocalLLaMA•Is Open LLM Leaderboard reliable source ? yi:34B is at the top but I get better results with neural-chat:7B modelEnglish
1·2 years agoIt’s a source. But rarely synthetic benchmarks give you the whole picture. Plus those test sets are in the public, so there is some incentive for some people to game the system (and even without that those data sets most likely are already in the training data).
I’m seconding that. I’m actually amazed by how it performs, frequently getting similar or better answers than bigger models. I start to think that we do lose a lot with quantization from the bigger models…
VertexMachine@alien.topBto
LocalLLaMA•Open LLM Leaderboard vs Reality: How do you evaluate "good" ?English
1·2 years agoYou either do standardized benchmarks like that leader-board (which are useful but limited) or you have your application-specific benchmark. Most often the latter are very, very time&work consuming to do right. Evaluating NLP systems in general is very hard problem.
Some people use more powerful models to evaluate weaker ones. E.g., use GPT4 to evaluate output of llama. Depending on your task it might work well. I did recently an early version of experiment with around 20 models for text summarization, where GPT4 and I were evaluating summaries (on predefined scale, with predefined criteria of evaluation). I didn’t calculate any proper measure of inter annotator agreement yet, but looking at the evalas side by side it’s really high.
Or if you are just playing around, you just write/search for a post on reddit (or various LLM related discords) asking for best model for your task :D
Interesting. I’m using oobabooga and that never happened to me. I actually don’t recall it ever outputting anything but English…