finding
pending-review
finding:detecting-unintended-outputs-via-introspectionDetecting Unintended Outputs via Introspection
lindsey-introspective-awareness-2026.mdFrontmatter (12 fields)
{
"doc": "lindsey-introspective-awareness-2026.md",
"author": null,
"context": "Models can distinguish artificially prefilled outputs from intentional responses by referencing prior internal representations; injection of matching concept vector causes model to retroactively accept prefill as intentional.",
"enrichment": {
"is_stale": true
},
"norm_label": "Detecting Unintended Outputs via Introspection",
"source_url": null,
"graphify_id": "prefill_detection_exp",
"source_file": "lindsey-introspective-awareness-2026.md",
"imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/lindsey-introspective-awareness-2026/graph.json",
"extracted_type": "finding",
"source_location": "§7",
"graphify_file_type": "finding"
}Outgoing (1)
Supports (1)
- Internality Criterion(concept)
Incoming (1)
answered_by (1)
Mentions (1)
- papers-typed
lindsey-introspective-awareness-2026.md