https://github.com/QwenLM/Qwen
Also released was a 1.8B model.
From Bunyan Hui’s Twitter announcement:
“We are proud to present our sincere open-source works: Qwen-72B and Qwen-1.8B! Including Base, Chat and Quantized versions!
🌟 Qwen-72B has been trained on high-quality data consisting of 3T tokens, boasting a larger parameter scale and more training data to achieve a comprehensive performance upgrade. Additionally, we have expanded the context window length to 32K and enhanced the system prompt capability, allowing users to customize their own AI assistant with just a single prompt.
🎁 Qwen-1.8B is our additional gift to the research community, striking a balance between maintaining essential functionalities and maximizing efficiency, generating 2K-length text content with just 3GB of GPU memory.
We are committed to continuing our dedication to the open-source community and thank you all for your enjoyment and support! 🚀 Finally, Happy 1st birthday ChatGPT. 🎂 “
https://preview.redd.it/sdofti9odg3c1.jpeg?width=1792&format=pjpg&auto=webp&s=d6f56d56c3596924ea61e1e5429018c0222907d2
Amazing capabilities on some benchmarks if true.