Natural Language Autoencoders (NLAs)

natural.md

Frontmatter (10 fields)

{
  "doc": "natural.md",
  "context": "Core unsupervised method for generating natural language explanations of LLM activations through a verbalizer-reconstructor pair trained with RL.",
  "category": "ai",
  "norm_label": "Natural Language Autoencoders (NLAs)",
  "graphify_id": "natural_language_autoencoders",
  "source_file": "natural.md",
  "imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/natural/graph.json",
  "extracted_type": "method",
  "source_location": "§Introduction",
  "graphify_file_type": "method"
}

Outgoing (12)

about (2)

Neuronpedia NLA Frontend(artifact)
NLA Training Code Repository(artifact)

Associated with (3)

Cites (1)

Transformer Circuits(venue)

Extends (1)

Activation Oracles (AO)(method)

Implements (5)

Incoming (12)

Supported by (6)

Mentions (1)

papers-typed
natural.md

Outgoing (12)

about (2)

Associated with (3)

Cites (1)

Extends (1)

Implements (5)

Incoming (12)

about (4)

Associated with (1)

Contradicted by (1)

Supported by (6)

Mentions (1)