Hey all! A friend and I have been building with open-source LLMs for a while now (originally for other project ideas) and found that quickly iterating with different fine-tuning datasets is super hard. Training a model, setting up some inference code to try out the model and then going back and forth took 90% of our time.
That’s why we built Haven, a service to quickly try out different fine-tuning datasets and base-models. Going from uploading a dataset to chatting with the resulting model now takes less than 5 minutes (using a reasonably sized dataset).
We fine-tune the models using low-rank adapters, which not only means that the changes made to the model are very small (only 30mb for a 7b parameter LLM), it also allows us to host many fine-tuned models very efficiently by hot swapping adapters on demand. This helped us reduce cold-start times to below one second. Research has shown that low-rank fine-tuning performance stays almost on-par with full fine-tuning.
We charge $0.004/1k training tokens. New accounts start with $5 in free credits so you can get started for free. You can export all the models to Huggingface.
Right now we support Llama-2 and Zephyr (which is itself a fine-tune of Mistral) as base-models. We’re gonna add some more soon. We hope you find this useful and we would love your feedback!
This is where to find it:
https://haven.run/


Why not just use grammar sampling with Llama cpp?
Is it possible to do this in a way that allows the model to choose whether to write normal text or to call one or more functions?
Well, you don’t have to have it ever write “normal” text. You can just have an object with a “text” property that the model is instructed to use only when it is not calling a function. Otherwise, it can provide different function calling json.
A grammar means it’s forced to output a structure, in this case, json. You can write instructions to output different json based on different scenarios and use code to check which key is present in the json. If the object has the key “text” its a text response. If it doesn’t its a function response.
That’s basically how the function call api works anyway, just less consistent than grammar.