concept
active
concept:measuring-faithfulness-in-chain-of-thought-reasoning-lanham-et-al-2023Measuring Faithfulness in Chain-of-Thought Reasoning (Lanham et al. 2023)
Cited regarding possibility of encoding misaligned reasoning in benign chains-of-thought
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Medium through which eval awareness is often verbalized; target of intervention.
- Addressed partially in §3.3.4 but remains open especially for no-CoT settings
- Gap in current evaluation methods; current work relies on CoT monitoring which may miss unverbalized beliefs.
- under what conditions does chain-of-thought reflect genuine uncertainty resolution versus a learned performance?question0.786Key question addressed by the task difficulty analysis comparing MMLU and GPQA-Diamond
- Key mechanistic claim supported by scratchpad modification experiments and conditioning analysis
- Phenomenon where steering vector intervention causes model's final output to contradict its own explicitly honest reasoning conclusion
- The condition that commitments are fulfilled.
- Chain-of-thought prompting elicits reasoning in large language models (Wei et al., 2022)concept0.771Foundational paper on CoT prompting cited as basis for reasoning LLM training