@while-1-fork

while-1-fork@alien.top · 1 year ago

That is the use case for fine tuning. Full fine tuning is just training a little on the new dataset but if you are not concerned with forgetting things from the original dataset, you can train more.

Also if the new dataset is too small, you should use data augmentation techniques like having a larger LLM rephrase things, maybe training on translations, put things into QA format with another LLM and so on. As training a lot of epochs with a large learning rate in a small dataset will lead to overfitting and pure exact memorization in detriment of understanding and reasoning.

Skipping the pre training on a large dataset / starting from a base model is going to give worse results. It is like trying to teach a newborn baby whatever you are trying to teach the LLM without even trying to teach it anything about the world (even to the point of locking it into a black box and only ever showing your document). In fact it is likely worse, the initialization for the baby has been fine tuned by evolution and we seem to pick up things with way less data than ANNs.