LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

ninjasaid13@alien.top · 2 years ago

squareOfTwo@alien.top · 2 years ago

They and their made up pseudo-scienfific pseudo “alignment” piss me so off.

No, a model won’t just have a stroke of genius and decide to hack into a computer. For many reasons.

Halluscination is one of them. Guessed a wrong token for a program? Oops the attack doesn’t work. Oh and don’t forget that tokens don’t fit into ctx.