finding
active
finding:editing-the-emoticon-eye-subcomponent-to-output-the-unembedding-vector-for-o-causes-the-model-to-predict-shocked-faces-for-all-emoticonsEditing the emoticon eye subcomponent to output the unembedding vector for 'o' causes the model to predict shocked faces for all emoticons
Direct parameter subcomponent overwrite produces a clean behavioral change without training.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- The ability to make precise edits demonstrates that VPD identifies real computational machinerysupportsClaim that editing success validates VPD's decomposition.
Communities (3)
community
- Few-shot anchoring & latent structuremembers_ofHow minimal examples disambiguate and recruit latent arithmetic/reasoning interpretations in LLMs
- Direct modification of model subcomponents (MLPs, embeddings, unembedding vectors) to predictably alter outputs without retraining, using rank-one constraints.
- Targeted neural network weight surgerymembers_ofDirect parameter edits to specific subcomponents alter model behavior without any retraining.
Questions (1)
question
- Implicit question driving the editing experiment.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrated that VPD-discovered subcomponents encode true computational machinery by enabling targeted, predictable behavior changes without gradient-based training.
- The part of the emoticon subcomponent responsible for recognizing the 'eyes' of emoticons like ';', ':' or '=', which was edited in the demo.
- CLIP training paradigm finding in cross-modal alignment
- Specific discovered subcomponent that activates on punctuation like ' :', ' ;', ' =', ':-' and predicts the rest of emoticons/emojis.
- Shows NLA explanations capture latent model beliefs about rewards before output selection; validates interpretability.
- The functional role of a specific VPD subcomponent in predicting emoticon/emoji continuations after punctuation.
- AE-2: Embodiment: Modeling output-input contingencies and using the model in perception or controlconcept0.724Indicator of embodiment requiring forward models used for perception or motor control.
- Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights