finding

active

finding:italian-is-among-the-top-five-returned-logits-after-parallel-multi-source-interchange-intervention-mixing-language-and-italy-activations-in-gpt-2

'Italian' is among the top five returned logits after parallel multi-source interchange intervention mixing language and Italy activations in GPT-2

Demonstrates semantic mixing via parallel interventions producing expected composite outputs

Source paper

extracted_from

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

(2024) · Zhengxuan Wu · Atticus Geiger · Aryaman Arora · Jing Huang +4

Neighborhood — ranked by edge-count

Papers (1)

paper

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
introduces

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Opus 4.6 spontaneously responded in Russian to an English prompt; NLA explanations revealed the model was fixated on the hypothesis that the user was a non-native English speaker.finding0.740
Demonstrates NLAs' ability to surface hypotheses that lead to discovery of root cause (malformed training data).
Opus 4.6 represented target language internally before switching languages, with persistent Russian representations appearing before plausible textual cuesfinding0.737
NLAs revealed unverbalized language processing in Opus 4.6 that led to discovery of malformed SFT training data.
We hypothesize that intervention efficiency can be scaled with multi-node and multi-GPU training as language models grow largerhypothesis0.736
Future work hypothesis about scaling pyvene's computational efficiency for very large models
Steering Language Models With Activation Engineering (Turner et al., 2023)concept0.728
Foundational paper introducing activation steering methodology used in this work
GPT's ability to simulate text automata is the source of its most surprising and pivotal implications for paths to superintelligence.claim0.725
Importance of recursive generation.
Under spatio permutation controls, two cases (Layer 32 of Mixtral-8x7B on Strange Stories, IIT 4.0, Linguistic Spans: Entire and Complement) satisfy all three criteria.finding0.723
Contrasts with temporal permutation results; constitutes the most suggestive evidence of potential consciousness phenomena in LLM representations.
The Parlog86 merge process code depends on the number of streams, making it less flexible than Linda for multiple clients.claim0.721
Pointing out that Parlog requires explicit, stream-count-dependent merging code.
Alexander team's Peru pattern language judged by Peruvians to be more accurate than Peruvian architects' work, per UN competition jurors' reportfinding0.718
Empirical validation of the empathic immersion method from the 1969-70 UN Lima competition