Focus probe (distracted vs. focused)

One of four emotive concept probes trained; contrastive pair distracted/focused with best layer 10 in LLaMA-3.2-3B

Neighborhood — ranked by edge-count

method

Contrastive mean-difference probe
implements
Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts

concept

Emotive states in LLMs
implements
Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection

finding

Interest probe: peak Cohen's d=1.67 (layer 14), p=9.45×10⁻⁶ in LLaMA-3.2-3B
supports
Probe validation result confirming interest direction captures meaningful structure

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interest probe (bored vs. interested)concept0.788
One of four emotive concept probes trained; contrastive pair bored/interested with best layer 14 in LLaMA-3.2-3B
Focused Attention Cycleconcept0.773
Formal model of meditation phenomenology: focus → distraction → awareness of distraction → redirection, derived from active inference.
Probesconcept0.768
Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
Impulsivity probe (impulsive vs. planning)concept0.757
One of four emotive concept probes trained; contrastive pair impulsive/planning with best layer 13 in LLaMA-3.2-3B
Focus Modelconcept0.757
Attentive Focusconcept0.754
Probes trained under different explicit instruction variants are highly aligned with each other despite different wording.claim0.727
Shows the key divide is passive vs. active framing, not the specific wording of instructions.
Activation Probingconcept0.713
Technique of reading out model beliefs from internal activations before the final answer token is generated