Qwen-72B released

PookaMacPhellimen@alien.top · 1 year ago

https://preview.redd.it/sdofti9odg3c1.jpeg?width=1792&format=pjpg&auto=webp&s=d6f56d56c3596924ea61e1e5429018c0222907d2

Amazing capabilities on some benchmarks if true.

PookaMacPhellimen@alien.top · 1 year ago

Also released was a 1.8B model.

From Bunyan Hui’s Twitter announcement:

“We are proud to present our sincere open-source works: Qwen-72B and Qwen-1.8B! Including Base, Chat and Quantized versions!

🌟 Qwen-72B has been trained on high-quality data consisting of 3T tokens, boasting a larger parameter scale and more training data to achieve a comprehensive performance upgrade. Additionally, we have expanded the context window length to 32K and enhanced the system prompt capability, allowing users to customize their own AI assistant with just a single prompt.

🎁 Qwen-1.8B is our additional gift to the research community, striking a balance between maintaining essential functionalities and maximizing efficiency, generating 2K-length text content with just 3GB of GPU memory.

We are committed to continuing our dedication to the open-source community and thank you all for your enjoyment and support! 🚀 Finally, Happy 1st birthday ChatGPT. 🎂 “

PookaMacPhellimen@alien.top · 1 year ago

Qwen-72B released

PookaMacPhellimen@alien.top · 1 year ago

Wonder if GPT4 is just a series of merges

PookaMacPhellimen@alien.top · 1 year ago

It’s a skill issue

PookaMacPhellimen@alien.top · 1 year ago

Lack of censorship is a key factor as it maximises the predictive abilities of the model.

PookaMacPhellimen@alien.top · 1 year ago

Why does no one use it?

PookaMacPhellimen@alien.top · 1 year ago

Holy Cow. Mistal 70B? The kraken AWAKES.

PookaMacPhellimen@alien.top · 1 year ago

We haven’t approached saturation yet with tokens versus parameters on models which disclose their training. 20B is highly plausible, particularly given success of Mistral at 7B.

PookaMacPhellimen@alien.top · 1 year ago

Biden Executive Order regulates VERY large models