Qwen-72B released

PookaMacPhellimen@alien.top · 2 years ago

Qwen-72B released

drooolingidiot@alien.top · 2 years ago

This is amazing. Yesterday we got Deepseek, and today we’re getting Qwen. Thank you for releasing this model!

I’m looking forward to seeing comparisons

lunar2solar@alien.top · 2 years ago

Is there any free website where I can test those Chinese models? Thanks.

roselan@alien.top · 2 years ago

for Deepseek there is https://chat.deepseek.com/ , for qwen I don’t know.

a_beautiful_rhind@alien.top · 2 years ago

Heh, 72b with 32k and GQA seems reasonable. Will make for interesting tunes if it’s not super restricted.

Wonderful_Ad_5134@alien.top · 2 years ago

If the US keeps going full woke and are too afraid to work as hard as possible on the LLM ecosystem, China won’t wait twice before winning this battle (which is basically the 21th century battle in terms of technology)

Feels sad to see the US decline like that…

carbocation@alien.top · 2 years ago

It would be great to see gguf versions. (At least, my workflow right now goes via ollama.) How are people running Qwen-72B locally right now?

Secret_Joke_2262@alien.top · 2 years ago

Now everyone is most interested in how much better it is than 70b llama

perlthoughts@alien.top · 2 years ago

Very nice.

QuieselWusul@alien.top · 2 years ago

Why did so many new chinese 70b foundation models release in a day? (this one, Deepseek, XVERSE) Is there any reason they all released in such a short time?

RayIsLazy@alien.top · 2 years ago

Today is the 1 year anniversary of chatgpt.

balianone@alien.top · 2 years ago

can it beats 3.5-turbo?

pseudonym325@alien.top · 2 years ago

The last Qwen didn’t really take off as base model for further fine-tunes.

Looking forward to the results on the German data protection training benchmark ;)

polawiaczperel@alien.top · 2 years ago

Would it be possible to merge it with deepseek coder 35B?

norsurfit@alien.top · 2 years ago

In my informal testing, Qwen72b is quite good. I anecdotally rate it stronger than Llama 2 from the few tests that I have conducted.

Secret_Joke_2262@alien.top · 2 years ago

What tests have you tested this in?

I’m very interested in storytelling and RP

Postorganic666@alien.top · 2 years ago

Is it censored?

EnergyUnlucky@alien.top · 2 years ago

Just when I’d talked myself out of getting a second 3090

ambient_temp_xeno@alien.top · 2 years ago

The first thing I looked for was the number of training tokens. I think yi34 got a lot of benefit from 3 trillion, so this model having 3 trillion bodes well.

PookaMacPhellimen@alien.top · 2 years ago

https://github.com/QwenLM/Qwen

Also released was a 1.8B model.

From Bunyan Hui’s Twitter announcement:

“We are proud to present our sincere open-source works: Qwen-72B and Qwen-1.8B! Including Base, Chat and Quantized versions!

🌟 Qwen-72B has been trained on high-quality data consisting of 3T tokens, boasting a larger parameter scale and more training data to achieve a comprehensive performance upgrade. Additionally, we have expanded the context window length to 32K and enhanced the system prompt capability, allowing users to customize their own AI assistant with just a single prompt.

🎁 Qwen-1.8B is our additional gift to the research community, striking a balance between maintaining essential functionalities and maximizing efficiency, generating 2K-length text content with just 3GB of GPU memory.

We are committed to continuing our dedication to the open-source community and thank you all for your enjoyment and support! 🚀 Finally, Happy 1st birthday ChatGPT. 🎂 “

rePAN6517@alien.top · 2 years ago

my heart skipped a beat because I thought it was Qwen-1.8T.

candre23@alien.top · 2 years ago

we have expanded the context window length to 32K

Kinda buried the lead here. This is far and away the biggest feature of this model. Here’s hoping it’s actually decent as well!

jeffwadsworth@alien.top · 2 years ago

Well, it depends on how well it keeps the context resolution. Did you see that comparison sheet on Claude and GPT-4? Astounding.

domlincog@alien.top · 2 years ago

https://preview.redd.it/c5k1ugynhj3c1.png?width=1100&format=png&auto=webp&s=4024b3e295ab740f341e132b9d9662104fdc09ef

Qwen-72B released

Qwen-72B released

Qwen/Qwen-72B · Hugging Face