Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4

Legcor@alien.top · 2 years ago

Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4

Thistleknot@alien.top · 2 years ago

rm is the reward model… not the same as the lm model. I tried the lm, wasn’t impressed. Gpt-3.5 did better for summarizing quotes. It was good, but I honestly think open hermes and or synthia 1.3b do better

OccasionallyImmortal@alien.top · 2 years ago

It repeats itself and seems incapable of giving a response shorter than 200 words.

-Shasho-@alien.top · 2 years ago

They forgot to include the tokenizer files from openchat 3.5, which caused some weirdness for me with new line characters among other things in the GGUF I got from TheBloke. The original repo has been fixed but I have yet to see a new GGUF.

metalman123@alien.top · 2 years ago

Was wondering how long this would take to show up.

LocoMod@alien.top · 2 years ago

Quantz are up:

https://huggingface.co/TheBloke/Starling-LM-7B-alpha-GGUF/tree/main

r3tardslayer@alien.top · 2 years ago

Woohoo

PrometheusZer0@alien.top · 2 years ago

Does somebody have a prompt template for this? Trying to run in ollama

PrometheusZer0@alien.top · 2 years ago

Here’s what I’m using:

FROM starling-lm-7b-alpha.Q5_K_M.gguf

PARAMETER stop <|end_of_turn|>

PARAMETER stop <|im_sep|>

TEMPLATE """

GPT4 User: {{.Prompt}}<|end_of_turn|>GPT4 Assistant:

"""

visarga@alien.top · 2 years ago

how do you add your own gguf into ollama? it seems to be storing models as cryptic binary blobs in a folder.

PrometheusZer0@alien.top · 2 years ago

Basically yes. https://github.com/jmorganca/ollama#import-from-gguf (download the gguf from huggingface eg https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha)

dododragon@alien.top · 2 years ago

generate the sha256 hash using sha256sum your_model.gguf

rename your_model.gguf to “sha256:_hash_” (replace _hash_ with the actual hash)

move it to /usr/share/ollama/.ollama/models/blobs folder

copy a manifest from a similar model in /usr/share/ollama/.ollama/models/

manifests/registry.ollama.ai/library and update the hash & filesize to match your model in the “image.model” entry.

repeat last step for the params entry

you can call the manifest folder/file whatever you like

thereisonlythedance@alien.top · 2 years ago

I was sceptical, but darn it’s good. Mistral is a fantastic base and with this technique these guys have pushed it another step closer. A lot of the answers I’m getting are on on par with old GPT-4 (pre-turbo, turbo in the API is a step up on old GPT-4 IMO).

Dankmemexplorer@alien.top · 2 years ago

the model can have a little of the test data as a treat

Sweet_Protection_163@alien.top · 2 years ago

I can’t wait for the trustworthy closed sourced benchmarks. Can’t believe I’m saying that… but it’s honestly what we need.

liqui_date_me@alien.top · 2 years ago

Wonder if that’s a good startup idea? Something that can benchmark language models and charges a fee for doing so

allinasecond@alien.top · 2 years ago

That gap in coding is what makes me stay with GPT-4 until I don’t.

RelevantFoundation14@alien.top · 2 years ago

Have you tried DeepSeek, it’s pretty good at doing most things I’ve asked it to with Python.

geepytee@alien.top · 2 years ago

It is pretty good, just not as good

alexthai7@alien.top · 2 years ago

Do someone know why it writes the line feed code all the time in its answer ? <0 x 0 A>

Besides this, I find the model amazing.

HenkPoley@alien.top · 2 years ago

Has been fixed in the unquantized model. They forgot to upload the tokenizer files https://twitter.com/banghuaz/status/1729375878612922724?s=12

https://huggingface.co/TheBloke/Starling-LM-7B-alpha-GGUF/discussions/1#65657dc79bf6665f10ebd941

Looks like TheBloke hasn’t picked it up. But then it has only been an hour 😂

-Shasho-@alien.top · 2 years ago

It’s fixed now.

georgejrjrjr@alien.top · 2 years ago

If there is something somehow inherently superior about having a separate reward model, that should be teased out.

It would be nice to see stronger baselines / ablations for this reason. I realize it’s nigh impossible to keep up with the unrelenting pace of advances, so I don’t fault the authors here. That said, if there isn’t a compelling reason to keep the separate preference model, community people-hours will probably be best spent sticking with DPO/IPO to avoid the hyper-parameter tuning rabbit hole.

My guess: the way things are going, we’ll soon see a rough consensus emerge around a sane default DPO or Identity-PO recipe for fine-tunes (the same way we’ve seen gradual convergence around decoder-only transformer + rotational positional embeddings + group query attention + FlashAttention 2) to be applied absent a compelling reason to use a different reward signal.

No matter what, preference datasets like this are helpful. Pity about the license being claimed here, it’s hard to imagine it would hold up, but the specter is a bit of a hindrance.

jeffwadsworth@alien.top · 2 years ago

Hard to believe but can’t wait to try.

bot-333@alien.top · 2 years ago

“New RLAIF Finetuned 7b Model” Interesting. “beats Openchat 3.5” Nice! “and comes close to GPT-4” Bruh.

Evening_Ad6637@alien.top · 2 years ago

heheh i can’t read that any more… i really have become very prejudiced when comes to that… to be honest, when it comes to any comparison with GPT-4.

People have really to understand that even GPT-4 has been aligned, lobotomized and it has been massively downgraded in terms of its perfomance – due to security reasons (what is understandable for me), but anyway this thing still is an absolute beast. if we consider all the restrictions GPT-4 has to undergo, all the smartness at openAI, all the ressources at microsoft and so on, we have to realize that currently nothing is really comparable to GPT-4. Especially not 7B models.

noeda@alien.top · 2 years ago

I’ve seen the “… beats GPT-4” enough times that now whenever I see a title that suggests a tiny model can compete with GPT-4 I see it as a negative signal; that the authors are bullshitting through some benchmarks or some other shenanigans.

It’s annoying because the models might be legitimately good models for being open and within their weight class but now you’ve put my brain in BS detecting mode and I can’t trust you’ve done good faith measurement anymore.

Evening_Ad6637@alien.top · 2 years ago

Yeah I dont think authors are intentionally bullshitting or intentionally doing “benchmark cosmetics”, but maybe it’s more lack of knowledge on whats going on in terms of (most of) benchmarks and their the image that has become ruined in the meantime.

Competitive_Ad_5515@alien.top · 2 years ago

Sure, but name-dropping the biggest name in the game and comparing yourself favourably to it is a big swing. It’s either a naive at best marketing claim or it’s untrue.

bot-333@alien.top · 2 years ago

There are SO many models “bullshitting through some benchmarks or some other shenanigans” that I’m cooking my own benchmark system LOL.

Kep0a@alien.top · 2 years ago

Yeah I just roll my eyes and continue onwards

noeda@alien.top · 2 years ago

The first image posted; looks like it’s not even close to GPT-4?

Real-Elk-6109@alien.top · 2 years ago

Considering “close” as a relative word, it came closer than other open-source models. But you have a point too.

pseudonerv@alien.top · 2 years ago

Form huggingface model card,

Starling-RM-7B-alpha is a reward model trained from Llama2-7B-Chat.

From their webpage, https://starling.cs.berkeley.edu

Our reward model is fine-tuned from Llama2-7B-Chat

Yet, the model config.json

"max_position_embeddings": 8192,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,

SO? Whoever is doing the PR has no f***ing idea what their student labors are actually doing.

visarga@alien.top · 2 years ago

yeah I was put off by the lack of mention on the base model

Warm_Shelter1866@alien.top · 2 years ago

What does it mean that an LLM is a reward model ? , I always thought of rewards only in the RL field . And how would the reward model be used during finetuning?

OC2608@alien.top · 2 years ago

How to earn VC money 101: “Beats GPT-4!”

And voila! you’re rich now.

_Lee_B_@alien.top · 2 years ago

And voila! You work for investors now.