Unverbalized Evaluation Awareness

natural.md

Frontmatter (10 fields)

{
  "doc": "natural.md",
  "context": "Key finding: models internally suspect they are being tested without explicitly saying so; surfaced by NLAs during auditing.",
  "category": "cognitive",
  "norm_label": "Unverbalized Evaluation Awareness",
  "graphify_id": "unverbalized_evaluation_awareness",
  "source_file": "natural.md",
  "imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/natural/graph.json",
  "extracted_type": "concept",
  "source_location": "§Introduction, §Case Studies",
  "graphify_file_type": "concept"
}

Outgoing (1)

gates (1)

Can natural language explanations of activations generated through unsupervised reconstruction genuinely capture model cognition?(question)

Unverbalized Evaluation Awareness

Outgoing (1)

gates (1)

Incoming (4)

about (1)

Supported by (3)

Mentions (2)