dataset
pending-review
dataset:claude-opus-4-6Claude Opus 4.6
natural.mdFrontmatter (10 fields)
{
"doc": "natural.md",
"context": "Primary target model for NLA development and case studies; underwent pre-deployment audit using NLAs.",
"category": "ai",
"norm_label": "Claude Opus 4.6",
"graphify_id": "claude_opus_4_6",
"source_file": "natural.md",
"imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/natural/graph.json",
"extracted_type": "dataset",
"source_location": "§Case Studies",
"graphify_file_type": "dataset"
}Outgoing (4)
answered_by (4)
- Editing NLA explanations to change 'reward' to 'penalty' produces steering vector that increases odd-number responses from near-zero to >70%, demonstrating belief capture upstream of behavior.(finding)
- Language switching caused by malformed training data—model fixates on spurious cues inferring user's non-native status, detected via NLA representations preceding foreign-language output.(finding)
- Model precomputes answers before tool invocation and attends to cached answer over tool output when discrepancy exists, confirmed via attribution graphs.(finding)
- NLA-derived steering vectors from edited explanations can causally shift planning representations, changing rhyme completion from 'rabbit' to 'mouse' at ~50% success rate.(finding)
Mentions (2)
- papers-typed
natural.md - papers-typed
natural.md