Noobs guide to mechanistic interpretability

Here’s a fun fact: nobody fully understands why large language models work. We know the math, we know the architecture, we can train them. But ask “why did it output this specific token?” and the h...

Mar 1, 2026 Deep Learning, Mechanistic Interpretability

Removing the Refusal Direction:How I Turned Param-1 Into an Uncensored Model Without Fine-Tuning

Large language models are often designed to refuse harmful or sensitive requests. In this work, I identified a “refusal direction” in the Param-1-2.9B-Instruct model and ablated it, effectively con...

Jan 13, 2026 Deep Learning, Mechanistic Interpretability

Executing Toxicity Mechanistic Localization of Toxic Behavior in a Fine-Tuned Transformer

note: I am currently working on this paper and this iteration might have errors and assumptions. I will be reiterating this idea on a lil bigger model like qwen3 0.6B. Abstract I investigate how ...

Jan 8, 2026 Deep Learning, Mechanistic Interpretability

Building a Semantic Search Engine for 1M+ arXiv Papers (<10ms Query Time)

Using a custom-trained Word2Vec model, centroid-based embeddings, and a FAISS ANN index, I built a semantic search engine that retrieves relevant arXiv papers in <10 ms, even at million-scale. T...

Nov 18, 2025 Deep Learning, Natural Language Processing

Perceptron and MLPs

Ever wondered how a computer can learn to tell apples from oranges? At the heart of it there is something deceptevly simple - the Perceptron. In this post, we’ll discuss about Perceptron and MLPs ...

Jun 16, 2025 Deep Learning