Hey all! A friend and I have been building with open-source LLMs for a while now (originally for other project ideas) and found that quickly iterating with different fine-tuning datasets is super hard. Training a model, setting up some inference code to try out the model and then going back and forth took 90% of our time.
That’s why we built Haven, a service to quickly try out different fine-tuning datasets and base-models. Going from uploading a dataset to chatting with the resulting model now takes less than 5 minutes (using a reasonably sized dataset).
We fine-tune the models using low-rank adapters, which not only means that the changes made to the model are very small (only 30mb for a 7b parameter LLM), it also allows us to host many fine-tuned models very efficiently by hot swapping adapters on demand. This helped us reduce cold-start times to below one second. Research has shown that low-rank fine-tuning performance stays almost on-par with full fine-tuning.
We charge $0.004/1k training tokens. New accounts start with $5 in free credits so you can get started for free. You can export all the models to Huggingface.
Right now we support Llama-2 and Zephyr (which is itself a fine-tune of Mistral) as base-models. We’re gonna add some more soon. We hope you find this useful and we would love your feedback!
This is where to find it:
https://haven.run/
This is an amazing sub with amazingly talented individuals. I love it here. This is great.
This means a lot! Thank you.
could you provide some directions on how to fine tune the model for coding? i have a ui framework in python that i would like to feed it the docs and some github repos code.
how would the dataset look like for that? should i be formulating different uses cases on the framework as if the user is asking?
in addition, do i need to provide standard python code or do those base modles have code in them already?
All weekend I’ve been wishing for a more streamlined fine-tuning experience. H
Glad to hear that we’re not the only ones!
Interesting service, I’m definitely going to try it. I’d like to fine tune a 7B for function calling, and if possible, mimic openai’s function description template so I can share them between model calls. I’ve experimented with injecting the function descriptions with a preamble to a user’s prompt and it works ok (with Mistral 7B Instruct) but with many edge cases. I suspect I need to fine tune to get it to improve. How would I go about structuring my user prompts in the training dataset? Would something like this work?:
{"messages": [{"role": "system", "content": "You are a helpful navigation assistant that calls the appropriate function base on a user's input."}, {"role": "user", "content": "Go to Paris, France"}, {"role": "assistant", "content": "{"lat": 48.856667, "lng":2.352222}]}
Why not just use grammar sampling with Llama cpp?
Is it possible to do this in a way that allows the model to choose whether to write normal text or to call one or more functions?
Well, you don’t have to have it ever write “normal” text. You can just have an object with a “text” property that the model is instructed to use only when it is not calling a function. Otherwise, it can provide different function calling json.
A grammar means it’s forced to output a structure, in this case, json. You can write instructions to output different json based on different scenarios and use code to check which key is present in the json. If the object has the key “text” its a text response. If it doesn’t its a function response.
That’s basically how the function call api works anyway, just less consistent than grammar.
This is really cool! Good choice on starting with the chat model and not the base model. They are much more friendly to alignment with a small dataset. In your post you mention you do QLorA in few mins. I am assuming that’s for a small dataset like <1000 samples? What’s your backend running on? I would love to learn how you are deploying and scaling this for multiple customers. Best of luck!
Yes, our datasets usually have a few hundred examples. We do support arbitrarily large datasets though, the fine-tuning just takes a little longer.
For deploying and scaling we’re using Modal, it’s a “serverless” GPU provider that we found to be very user-friendly.
hey, can I do the fine-tuning on my own computer or only in your cloud?
Fine-tuning is online. You can download the weights and run them wherever (including your own computer).