finding
active
finding:italian-is-among-the-top-five-returned-logits-after-parallel-multi-source-interchange-intervention-mixing-language-and-italy-activations-in-gpt-2'Italian' is among the top five returned logits after parallel multi-source interchange intervention mixing language and Italy activations in GPT-2
Demonstrates semantic mixing via parallel interventions producing expected composite outputs
Source paper
extracted_from(2024) · Zhengxuan Wu · Atticus Geiger · Aryaman Arora · Jing Huang +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates NLAs' ability to surface hypotheses that lead to discovery of root cause (malformed training data).
- NLAs revealed unverbalized language processing in Opus 4.6 that led to discovery of malformed SFT training data.
- We hypothesize that intervention efficiency can be scaled with multi-node and multi-GPU training as language models grow largerhypothesis0.736Future work hypothesis about scaling pyvene's computational efficiency for very large models
- Foundational paper introducing activation steering methodology used in this work
- Importance of recursive generation.
- Contrasts with temporal permutation results; constitutes the most suggestive evidence of potential consciousness phenomena in LLM representations.
- Pointing out that Parlog requires explicit, stream-count-dependent merging code.
- Empirical validation of the empathic immersion method from the 1969-70 UN Lima competition