Removing the Refusal Direction:How I Turned Param-1 Into an Uncensored Model Without Fine-Tuning
Large language models are often designed to refuse harmful or sensitive requests. In this work, I identified a “refusal direction” in the Param-1-2.9B-Instruct model and ablated it, effectively con...