finding
active
finding:input-embedding-similarity-baseline-selects-semantically-related-but-non-reflective-tokens-e-g-await-configureawait-unchecked-that-fail-to-improve-accuracyInput embedding similarity baseline selects semantically related but non-reflective tokens (e.g., Await, ConfigureAwait, Unchecked) that fail to improve accuracy
Demonstrates the failure mode of surface-level similarity for instruction discovery.
Source paper
extracted_from(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan
Neighborhood — ranked by edge-count
Claims (2)
claim
- Core applied contribution claim, supported by top-k accuracy comparisons.
- Supported by the instruction discovery experiments comparing steering vs. embedding baselines.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Baseline method for instruction discovery using surface-level input embedding similarity instead of steering vectors.
- Demonstrates that surface-level embedding similarity fails to capture reflective semantics.
- How do we incorporate a focus on behavioral relevance in our measures of neural similarity?question0.750Direct motivating question for MAS's design principle of causal behavioral matching.
- Shows the passive vs. active divide is more important than the specific wording of instructions.
- Conceptual decomposition arising from the data showing different models dissociate these traits
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge
- Core theoretical claim about the target of representation learning
- Interpretive claim from Experiment 3; GPT, Claude, Gemini families converge on similar descriptive style despite independent training