finding
pending-review
finding:production-models-show-zero-false-positives-on-thought-injection-detectionProduction models show zero false positives on thought injection detection
lindsey-introspective-awareness-2026.mdFrontmatter (12 fields)
{
"doc": "lindsey-introspective-awareness-2026.md",
"author": null,
"context": "Opus 4.1 never claims to detect injected thought when none applied (0/100 trials); production Claude models maintain essentially zero false positive rate.",
"enrichment": {
"is_stale": true
},
"norm_label": "Production models show zero false positives on thought injection detection",
"source_url": null,
"graphify_id": "false_positive_control",
"source_file": "lindsey-introspective-awareness-2026.md",
"imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/lindsey-introspective-awareness-2026/graph.json",
"extracted_type": "finding",
"source_location": "§5.3, §5.7",
"graphify_file_type": "finding"
}Outgoing (1)
Supports (1)
- Self-report of Injected Thoughts(finding)
Incoming (0)
None.
Mentions (1)
- papers-typed
lindsey-introspective-awareness-2026.md