chain-of-thought

A technique that outputs intermediate reasoning steps, used here to detect verbalized eval awareness.

Neighborhood — ranked by edge-count

paper

concept

Performative chain-of-thought
related_to
Central concept: verbalized reasoning that occurs after the model has already internally settled on an answer, particularly on easier tasks.
Chain-of-Thought Reasoning
related_to
Medium through which eval awareness is often verbalized; target of intervention.
Factored cognition / chain-of-thought
related_to
Using multi-step reasoning by generating intermediate thoughts.
verbalized eval awareness
associated_with
The phenomenon where a model explicitly states in its chain-of-thought that it is being evaluated, tested, or benchmarked.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Chain-of-thought promptingmethod0.889
Technique by which LLMs generate intermediate reasoning steps before final output; used by ChatGPT o3.
Chain-of-Thought (CoT)framework0.879
A prompting technique that elicits intermediate reasoning steps before final answer inference in language models.
Unfaithful Chain-of-Thoughtconcept0.845
Phenomenon where steering vector intervention causes model's final output to contradict its own explicitly honest reasoning conclusion
Inner monologue / chain-of-thought in LLMsconcept0.800
The hidden reasoning steps generated by recent LLMs before visible output; mentioned in the technology section.
Chain-of-thought prompting elicits reasoning in large language models (Wei et al., 2022)concept0.788
Foundational paper on CoT prompting cited as basis for reasoning LLM training
Measuring Faithfulness in Chain-of-Thought Reasoning (Lanham et al. 2023)concept0.769
Cited regarding possibility of encoding misaligned reasoning in benign chains-of-thought
under what conditions does chain-of-thought reflect genuine uncertainty resolution versus a learned performance?question0.761
Key question addressed by the task difficulty analysis comparing MMLU and GPQA-Diamond
A small number of high-quality human demonstrations of chain-of-thought reasoning could be used to improve and focus performance.hypothesis0.760
Section 6 mentions high-quality human demos could improve natural language feedback.