Interest probe (bored vs. interested)

One of four emotive concept probes trained; contrastive pair bored/interested with best layer 14 in LLaMA-3.2-3B

Neighborhood — ranked by edge-count

method

Contrastive mean-difference probe
implements
Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts

concept

Emotive states in LLMs
implements
Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection

finding

Interest probe: peak Cohen's d=1.67 (layer 14), p=9.45×10⁻⁶ in LLaMA-3.2-3B
supports
Probe validation result confirming interest direction captures meaningful structure

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interestconcept0.829
Something that benefits or harms a being; tied to welfare subjectivity.
Focus probe (distracted vs. focused)concept0.788
One of four emotive concept probes trained; contrastive pair distracted/focused with best layer 10 in LLaMA-3.2-3B
Curiosityconcept0.769
Active sampling of novel contingencies to minimize uncertainty; formalized as novelty component of expected free energy
Probesconcept0.758
Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
Impulsivity probe (impulsive vs. planning)concept0.736
One of four emotive concept probes trained; contrastive pair impulsive/planning with best layer 13 in LLaMA-3.2-3B
Wellbeing probe (sad vs. happy)concept0.729
One of four emotive concept probes trained; contrastive pair sad/happy with best layer 16 in LLaMA-3.2-3B
Probe Generalizationconcept0.705
The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets
Probe scoreconcept0.705
Dot product between hidden state and concept vector averaged across 5-layer window around best layer; measures model's internal emotive state