concept
active
concept:interest-probe-bored-vs-interested

Interest probe (bored vs. interested)

One of four emotive concept probes trained; contrastive pair bored/interested with best layer 14 in LLaMA-3.2-3B

Neighborhood — ranked by edge-count

Methods (1)

method
  • Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts

Concepts (1)

concept
  • Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Interestconcept0.829
    Something that benefits or harms a being; tied to welfare subjectivity.
  • One of four emotive concept probes trained; contrastive pair distracted/focused with best layer 10 in LLaMA-3.2-3B
  • Curiosityconcept0.769
    Active sampling of novel contingencies to minimize uncertainty; formalized as novelty component of expected free energy
  • Probesconcept0.758
    Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
  • One of four emotive concept probes trained; contrastive pair impulsive/planning with best layer 13 in LLaMA-3.2-3B
  • One of four emotive concept probes trained; contrastive pair sad/happy with best layer 16 in LLaMA-3.2-3B
  • The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets
  • Probe scoreconcept0.705
    Dot product between hidden state and concept vector averaged across 5-layer window around best layer; measures model's internal emotive state