Input Embedding Similarity Baseline

Baseline method for instruction discovery using surface-level input embedding similarity instead of steering vectors.

Neighborhood — ranked by edge-count

method

Cosine Similarity Ranking for Instruction Discovery
associated_with
Method to discover new reflection-inducing instructions by ranking candidate tokens by cosine similarity to steering vectors.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Input embedding similarity baseline selects semantically related but non-reflective tokens (e.g., Await, ConfigureAwait, Unchecked) that fail to improve accuracyfinding0.853
Demonstrates the failure mode of surface-level similarity for instruction discovery.
Steering vector-based instruction discovery outperforms input embedding similarity baseline for reflection-inducing instruction selectionfinding0.779
Demonstrates that surface-level embedding similarity fails to capture reflective semantics.
Vector Embedding Representationconcept0.766
The specific type of representation studied in the paper: function f: X→R^n assigning feature vectors to inputs
Mutual Embeddingconcept0.764
A reinforcing interlock between different materials, mentioned alongside Deep Interlock in West Dean construction.
Activation Similarityconcept0.750
Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset
Certain representation learning algorithms boil down to a simple rule: find an embedding in which similarity equals PMIclaim0.745
Core theoretical claim about the target of representation learning
span embedding analysismethod0.740
Extracting embeddings from instruction and example spans.
Functional Similarityconcept0.739
Similarity measured with respect to network behavior/function rather than statistical correlation of activations.