LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

ninjasaid13@alien.top · 2 years ago

a_beautiful_rhind@alien.top · 2 years ago

Yea, no shit. I did it to vicuna using proxy logs. The LLM attacks are waaaay more effective once you find the proper string.

I’d run the now working 4-bit version on more models, it’s just that I tend to boycott censored weights instead.