The ridle json + sydney actually makes the model far more lucid than normally. I applied it on Mythomax and the answers are really good.
The ridle json + sydney actually makes the model far more lucid than normally. I applied it on Mythomax and the answers are really good.
I find it very strange attaching the gguf file to an exe - it’s a very bad security idea (your antivirus needs to hash 10 GB file) and then on windows you still need to split it to exe and data, because the exe limit is 4GB so basically instead of llama.cpp you are now using llamafile that is llamacpp. Or am I missing something?
Geez, Alexa is in middle ages… I want fun Alexa!
Hahaha, this is totally Llama 2 with Alexa’s voice.
Call it a voodoo, not an art.
And also made LLM for the users to abuse… Wait when twitter replies start sounding like Hitchhiker’s guide to the galaxy and citing Vogon poetry, coz that’s the “style” of Grok.
I use twitter while drinking morning coffee - it makes it 2x stronger.
I’m thinking that’s probably it.
Commu - what? Blasphemy…
That looks very clean for sure.
My private finetunes are about text rewriting - input text paragraph - rewrite it in a certain style.
No 7b finetuned model can grasp the idea of submitted text in entirety, tried maybe 100 different runs. It would make a common mistake of “someone” who just scan the text quickly while also watching youtube on a phone, failing to comprehend who is who or what the paragraph is about.
13b with the same finetuning does much better - it would comprehend the relations. For example if two people are speaking, it can keep track who is who, even without mentioning it in the text.
33b - gets even further - sometimes surprise with the way it understand the text. And so the rewritten text is a mirror image of the input, just with different style
7b are impressive if you want a small local LLM to give you answers on questions, but that’s probably the limit. If you want an assistant that can also do other things, then it falls short, because your instructions are not necessary understood fully.
Well, no LLama model would come even close to GPT-4 in translation…
On somehow similar note of adding noise during finetuning to help with generalization: I you using oobabooga, you can look at Training PRO
https://github.com/FartyPants/Training_PRO
And then experiment with NEFtune noise scale.
It is somehow simillar idea - but on the other end - pretraining, I assume you are talking about adding noise in interference in sampler. Worth pursuing for sure - the results, however are unpredictable before trying it…
Get a base model of your choice, finetune it with plain text book of a style you want it to talk. Done.
There is very little info.
It seems to be instruction finetuned, but what template? ChatML? There is no mention of anything. Posting it this way is pretty bad.
Are you saying you want a model that will spit out LORA’s? Like “Please generate me Lora that will make yourself totally amazing?”
If so, this is more in the realm of star trek food replicator. AKA it works amazingly on a TV screen.
If not then, sorry.
The closest to this would be a model that will pickup the correct LORA needed to reply. Adapters can be easily switched on the fly and so a model can be made that would call a function to select correct adapter. Maybe this is how ChatGPT works. maybe not.
The more it doesn’t know something the more it will create a perfect nonsense with a straight face. It’s like replies on twitter.
Don’t start writing entire GUI. First make a simple code that loads model and does interference. There are probably 10 lines in total. You can just grab the code people have with their models (like TheBloke always post a code snippet)
Now create gui and instead of preset text add text box and a button “Go” so it can do interfernce from your text.
Boom, A GUI.
Now go from there, keep adding - dropdown list to select model, dropdown list to select instruction template…etc…
If your epoch is 50 steps then you are not going to use 100 warmup steps.
In Training Pro extension I use 0.1 of total steps for warmup, but max 100 (there isn’t point to go higher, after 100 steps you should have primed most of the weights)
So if you have 3800 samples, which is a ton, 100 warmup step is as good as any.
val_set_size seems to be size of evaluation data. Now it depends if you want to even use evaluation data or not (some type of training have no reason to use evaluation data as it will not evaluate anything useful) . Again with big dataset 0.04 is fine. With small dataset 0.04 will create 1 evaluation sample - you are far better not to have ANY evaluation dataset.
I could write a book about this.
First no model will be able to write well in style x and then in style y. Not at the size we can work with. Not at the size of ChatGPT either, although it would generally work better (it usually well approximates what the style is and falls into a Hollywood imagination of that style)
With finetuning you can make the model do one of those things.
## Instruction:
Rewrite the following text as a poem: Lister sat near the fireplace, looking at his arms which were covered in ointment. I find it hard to believe that pain can form words. Is this something internal or external? On his left arm, the message read “Dying” while on his right arm, it said “Living”. He did not know what to think. A man who had been through a lot.
## Response:
Lister sat by the hearth and gazed
Upon the ointment-soaked clay
That coated his sinewy bare arms.
How could such agony frame
Such mystic characters? On one arm
The word “Dying” was graven clear;
On the other, “Living” appeared.
What could it mean? Ah! there,
A man whose life had been one tale
Of woes, of tortures, of crosses, of shame,
Who knew not what to hail as his own name.