concept
active
concept:impulsivity-probe-impulsive-vs-planningImpulsivity probe (impulsive vs. planning)
One of four emotive concept probes trained; contrastive pair impulsive/planning with best layer 13 in LLaMA-3.2-3B
Neighborhood — ranked by edge-count
Methods (1)
method
- Contrastive mean-difference probeimplementsProbe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts
Concepts (1)
concept
- Emotive states in LLMsimplementsDirections in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection
Findings (1)
finding
- Strongest probe validation result; highest Cohen's d among the four concepts
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- One of four emotive concept probes trained; contrastive pair distracted/focused with best layer 10 in LLaMA-3.2-3B
- Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
- Key interpretive claim from Case Study II distinguishing probe accuracy from causal relevance
- One of four emotive concept probes trained; contrastive pair bored/interested with best layer 14 in LLaMA-3.2-3B
- Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
- About chain-of-thought and process safety.
- Probing approach avoiding supervision to sidestep complexity-accuracy tradeoff
- Impulsivity→interest: ρ increases from 0.70 (α=-4) to 0.83 (α=+4); R² from 0.46 to 0.69 in LLaMA-3.2-3Bfinding0.727Scatter plot visualization showing strengthened probe-report relationship across alpha range