Claude Opus 4.6

natural.md

Frontmatter (10 fields)

{
  "doc": "natural.md",
  "context": "Primary target model for NLA development and case studies; underwent pre-deployment audit using NLAs.",
  "category": "ai",
  "norm_label": "Claude Opus 4.6",
  "graphify_id": "claude_opus_4_6",
  "source_file": "natural.md",
  "imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/natural/graph.json",
  "extracted_type": "dataset",
  "source_location": "§Case Studies",
  "graphify_file_type": "dataset"
}

Outgoing (4)

answered_by (4)

Editing NLA explanations to change 'reward' to 'penalty' produces steering vector that increases odd-number responses from near-zero to >70%, demonstrating belief capture upstream of behavior.(finding)
Language switching caused by malformed training data—model fixates on spurious cues inferring user's non-native status, detected via NLA representations preceding foreign-language output.(finding)
Model precomputes answers before tool invocation and attends to cached answer over tool output when discrepancy exists, confirmed via attribution graphs.(finding)
NLA-derived steering vectors from edited explanations can causally shift planning representations, changing rhyme completion from 'rabbit' to 'mouse' at ~50% success rate.(finding)

Incoming (3)

Mentions (2)

papers-typed
natural.md
papers-typed
natural.md

Outgoing (4)

answered_by (4)

Incoming (3)

Associated with (1)

Implemented by (1)

mentions (1)

Mentions (2)