framework
active
framework:reasoning-theater-frameworkReasoning Theater Framework
The conceptual framework introduced by the paper distinguishing performative CoT from genuine reasoning using activation probing
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (2)
method
- CoT MonitorusesNamed method for monitoring chain-of-thought text to detect when the model signals its answer, compared against activation probes
- Named evaluation protocol: truncating CoT at various points and forcing the model to give a final answer, to measure when the answer stabilizes
Concepts (1)
concept
- Technique of reading out model beliefs from internal activations before the final answer token is generated
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- 1984 Ashton-Tate integrated system with frames, FRED language, and overlapping windows; design reference for Playground's approach.
- The primary conceptual framework proposed: understanding dialogue agent behaviour as role play of characters
- Bostrom's category of AIs that perform specific tasks without overarching goals.
- Bostrom's category of AIs that produce desired results given commands but do not act autonomously.
- The view of AI as a question-answer system optimized for correctness, often inherited from supervised learning.
- Foundational mechanistic interpretability paper on transformer circuit analysis