• 6 Posts
  • 59 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle
  • I could write a book about this.

    First no model will be able to write well in style x and then in style y. Not at the size we can work with. Not at the size of ChatGPT either, although it would generally work better (it usually well approximates what the style is and falls into a Hollywood imagination of that style)

    With finetuning you can make the model do one of those things.

    ## Instruction:

    Rewrite the following text as a poem: Lister sat near the fireplace, looking at his arms which were covered in ointment. I find it hard to believe that pain can form words. Is this something internal or external? On his left arm, the message read “Dying” while on his right arm, it said “Living”. He did not know what to think. A man who had been through a lot.

    ## Response:

    Lister sat by the hearth and gazed

    Upon the ointment-soaked clay

    That coated his sinewy bare arms.

    How could such agony frame

    Such mystic characters? On one arm

    The word “Dying” was graven clear;

    On the other, “Living” appeared.

    What could it mean? Ah! there,

    A man whose life had been one tale

    Of woes, of tortures, of crosses, of shame,

    Who knew not what to hail as his own name.













  • My private finetunes are about text rewriting - input text paragraph - rewrite it in a certain style.

    No 7b finetuned model can grasp the idea of submitted text in entirety, tried maybe 100 different runs. It would make a common mistake of “someone” who just scan the text quickly while also watching youtube on a phone, failing to comprehend who is who or what the paragraph is about.

    13b with the same finetuning does much better - it would comprehend the relations. For example if two people are speaking, it can keep track who is who, even without mentioning it in the text.

    33b - gets even further - sometimes surprise with the way it understand the text. And so the rewritten text is a mirror image of the input, just with different style

    7b are impressive if you want a small local LLM to give you answers on questions, but that’s probably the limit. If you want an assistant that can also do other things, then it falls short, because your instructions are not necessary understood fully.






  • FPham@alien.topBtoLocalLLaMADynamic LoRAs -- Crazy idea?
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Are you saying you want a model that will spit out LORA’s? Like “Please generate me Lora that will make yourself totally amazing?”

    If so, this is more in the realm of star trek food replicator. AKA it works amazingly on a TV screen.

    If not then, sorry.

    The closest to this would be a model that will pickup the correct LORA needed to reply. Adapters can be easily switched on the fly and so a model can be made that would call a function to select correct adapter. Maybe this is how ChatGPT works. maybe not.



  • FPham@alien.topBtoLocalLLaMAlocal llm in own GUI
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Don’t start writing entire GUI. First make a simple code that loads model and does interference. There are probably 10 lines in total. You can just grab the code people have with their models (like TheBloke always post a code snippet)

    Now create gui and instead of preset text add text box and a button “Go” so it can do interfernce from your text.

    Boom, A GUI.

    Now go from there, keep adding - dropdown list to select model, dropdown list to select instruction template…etc…


  • If your epoch is 50 steps then you are not going to use 100 warmup steps.

    In Training Pro extension I use 0.1 of total steps for warmup, but max 100 (there isn’t point to go higher, after 100 steps you should have primed most of the weights)

    So if you have 3800 samples, which is a ton, 100 warmup step is as good as any.

    val_set_size seems to be size of evaluation data. Now it depends if you want to even use evaluation data or not (some type of training have no reason to use evaluation data as it will not evaluate anything useful) . Again with big dataset 0.04 is fine. With small dataset 0.04 will create 1 evaluation sample - you are far better not to have ANY evaluation dataset.