Noobs guide to mechanistic interpretability
Here’s a fun fact: nobody fully understands why large language models work. We know the math, we know the architecture, we can train them. But ask “why did it output this specific token?” and the h...
Here’s a fun fact: nobody fully understands why large language models work. We know the math, we know the architecture, we can train them. But ask “why did it output this specific token?” and the h...
Large language models are often designed to refuse harmful or sensitive requests. In this work, I identified a “refusal direction” in the Param-1-2.9B-Instruct model and ablated it, effectively con...
note: I am currently working on this paper and this iteration might have errors and assumptions. I will be reiterating this idea on a lil bigger model like qwen3 0.6B. Abstract I investigate how ...
Using a custom-trained Word2Vec model, centroid-based embeddings, and a FAISS ANN index, I built a semantic search engine that retrieves relevant arXiv papers in <10 ms, even at million-scale. T...
Ever wondered how a computer can learn to tell apples from oranges? At the heart of it there is something deceptevly simple - the Perceptron. In this post, we’ll discuss about Perceptron and MLPs ...