• 1 Post
  • 2 Comments
Joined 1 year ago
cake
Cake day: November 14th, 2023

help-circle
  • Maybe I missed it but the most important argument might have slipped which is quite simply that GPT4 looks and feels good, however if you have a clear task (anything, literally - examples are data structuring pipelines, information extraction, repairing broken data models) then a fine tuned llama model will make GPT4 look like a toddler. It’s crazy and if you don’t believe me I can only recommend to everyone to give it a try and benchmark the results. It is that much of a difference. Plus, it allows you to iron out bugs in the understanding of GPT4. There is clear limits to where prompt engineering can take you.

    To be clear I am really saying that there is things GPT4 just cannot do where a fine tuned llama just gets the job done.


  • Hi u/mcmoose1900 thanks a lot for the reply!

    By my understanding, i already make use of peft and lora since starting this endeavour.

    See excerpts of the code here (there is a chance that maybe it does not get used as intended due to the often weird ways Python works).

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        load_in_8bit=False,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.float16
    )
    
    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_name,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True,
    )
    

    and here

    peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.2,
        r=64,
        bias="none",
        task_type="CAUSAL_LM",
    )
    
    max_seq_length = MAX_SEQ_LENGTH
    trainer = SFTTrainer(
        model=base_model,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        peft_config=peft_config,
        formatting_func=formatting_func,
        max_seq_length=max_seq_length,
        tokenizer=tokenizer,
        args=training_args,
    )
    

    and the parameters

    MAX_SEQ_LENGTH = 8192
    LEARNING_RATE = 2e-5
    PER_DEVICE_BATCH_SIZE = 1
    GRADIENT_ACCUMULATION_STEPS = 1
    USE_EVAL = True
    QUANT_BIT_8 = False
    QUANT_BIT_4 = not QUANT_BIT_8
    

    The numbers above are very low as i tried lowering them to mitigate the OOM issue without success. Normally they would not make sense.