While instructing fine tuning mistral, what are the parameters that needs to be set for the tokenizer?
what is the default eos token for Mistral and padding should left or right?

I’ve exhausted all the online articles trying to find this. Please help. I’m instruction fine tuning Base Mistral for Text to SQL task.

  • kivathewolf@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Per the tokenizer.config for mistral instruct model the eos is . You can use the same. If you check the tokenizer file for the instruct base model, the is defined as a special token. So it will work fine for eos. Reg padding, the reason you define the padding is so that all your batches are of same fixed length during tuning. Define your dataset with <s> to start and use </s> to eos and pad to right.

    Btw, why are you fine tuning the base model for text to sql? Won’t it be better to fine tune the instruct model for this? You can use the same prompt template as the instruct model uses. Good luck and let me know how it goes.

    • weedyuh@alien.top
      cake
      OPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Thank you! I used the for fine tuning the base mistral but during inferencing, the answer didn’t stop generating with the answer sql query. It didn’t generate the end token after the answer and the result went on and on in an an auto completion format.

      I didn’t think to use the instruct model for this! I will give that a try and let you know in a few days.

      • kivathewolf@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I believe you are using LoRa? How are you training? What library are you using? In my experience (which is limited) many libraries don’t set attention to 1 for the eos token. Thus the model is trained to ignore it. If you use the hugging face trainer library, you need to define your own mapping function in which you set the attention for the eos token to be 1. Make sure your dataset used for training also uses at the end of the response. If you do that, then you probably don’t need to mess with the attention. All these problems go away when you use an instruct model as it’s already trained to stop at the end. If you use the same prompt format in your fine tuning dataset, that will work well.