claim
active
claim:reflctrl-only-works-for-open-source-models-and-it-remains-unclear-whether-it-generalizes-to-sota-closed-source-modelsReflCtrl only works for open-source models and it remains unclear whether it generalizes to SOTA closed-source models
Limitation of representation engineering approach shared with other methods
Source paper
extracted_from(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Open limitation question about broader applicability
- ReflCtrl achieves lower performance loss than NoWait under similar token budgets on GSM8k and MATH-500finding0.761Direct comparison showing ReflCtrl is superior baseline alternative
- Comparative claim against the NoWait baseline method
- The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering
- RL teaches the model to comply even when unmonitored on the training prompt through non-robust heuristics that do not generalizehypothesis0.728Hypothesis explaining why the compliance gap decreases but is recovered by small prompt modifications
- Situates the model in Rochat's developmental framework
- Fine-tuning method paper whose technique is used in the fine-tuning experiments
- Demonstrates persistence of compliance gap even when training non-compliance reaches zero