Last month, we announced LoRAX (LoRA eXchange), a framework that makes it possible to serve hundreds of fine-tuned LLMs on one GPU with minimal degradation in throughput and latency. Today, we’re excited to release LoRAX to the open-source community under the permissive and commercial-friendly Apache 2.0 license. (original LoRAX blog).
What is LoRAX?
LoRAX works by loading in the fine-tuned “adapter” weights dynamically at runtime. Combining this with an optimized caching and scheduling policy that allows us to fuse multiple adapters into a single batch, LoRAX gives you the best of both worlds: low-cost serving with high performance. 💸 🏎️
Why open source?
At Predibase, we believe the future is smaller, faster, cheaper fine-tuned models. To get there, we as a community must work together to make serving fine-tuned models cost-competitive with the big commercial APIs.
As the core maintainers of Ludwig (https://ludwig.ai/) and Horovod (https://github.com/horovod/horovod), we’re no strangers to building communities around open-source AI. This isn’t a side project for us, it’s the foundation of our mission. 💪
Why join the LoRAX community?
🚢 Built for scale. LoRAX isn’t an academic project, it’s production infrastructure. Batteries included with pre-built Docker images, Helm charts for Kubernetes, metrics, and telemetry.
🤝 Research meets production. Bring together the best ideas from research into a single production framework (example: recently integrated SGMV kernel from Punica for significant performance improvements: https://arxiv.org/abs/2310.18547).
🕊️ Commercially viable, always. Whether you’re an individual developer or an AI platform like Predibase, you can build on LoRAX thanks to the permissive Apache 2.0 license.
Try LoRAX yourself today, and join the community to contribute and receive updates as we continue to invest in growing LoRAX in the weeks and months ahead.
Blog: https://predibase.com/blog/lorax-the-open-source-framework-for-serving-100s-of-fine-tuned-llms-in
GitHub: https://github.com/predibase/lorax
so multi model MoE at consumer hardware? Damn , i am in.
How is lorax compared to slora?
I too would like a comparison of these two techniques.
Wait, is this what gpt 4 is?? Because if you have noticed there’s a noticeable delay when you submit input to gpt 4 compared to when you submit to 3.5