Steering Along Manifolds to Control Neural Networks

/Users/antonborzov/Documents/Research.nosync/papers/wurgaft-goodfire-manifold-steering-2026.md

External IDs

arxiv

2605.05115

title_hash

6aeb0eb95c0267bf2b9a6929aad46b1fdb28e7d6

legacy_slug

wurgaft-goodfire-manifold-steering-2026

doi

10.48550/arxiv.2605.05115

Frontmatter (19 fields)

{
  "doi": "10.48550/arxiv.2605.05115",
  "pdf": "https://arxiv.org/pdf/2605.05115",
  "url": "https://arxiv.org/abs/2605.05115",
  "tags": [
    "representation-steering",
    "manifold-geometry",
    "feature-steering",
    "Llama-3",
    "cyclic-concepts",
    "mechanistic-interpretability",
    "goodfire"
  ],
  "year": 2026,
  "saved": "2026-05-14",
  "title": "Steering Along Manifolds to Control Neural Networks",
  "venue": "arXiv preprint",
  "status": "full-text-saved",
  "landing": "https://www.goodfire.ai/research/manifold-steering",
  "arxiv_id": 2605.05115,
  "published": "2026-05-07",
  "enrichment": {
    "is_stale": true
  },
  "affiliation": "Goodfire (+ Stanford / collaborators)",
  "openalex_id": "W7160544910",
  "openalex_year": 2026,
  "openalex_enriched_at": 1778977722,
  "openalex_match_title": "Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior",
  "openalex_cited_by_count": 0
}

Outgoing (4)

Incoming (19)

Authored by (16)

Atticus Geiger(thinker)
Can Rager(thinker)
Daniel Wurgaft(thinker)
Ekdeep Singh Lubana(thinker)
Eric Bigelow(thinker)
Jack Merullo(thinker)
Matthew Kowal(thinker)
Noah Goodman(thinker)
Owen Lewis(thinker)
Raphaël Sarfati(thinker)
Sheridan Feucht(thinker)
Tal Haklay(thinker)
Thomas Fel(thinker)
Thomas McGrath(thinker)
Usha Bhalla(thinker)
+1 more

Supported by (3)

References (30)

Methods of information geometry
referenced-only
Refusal in language models is mediated by a single direction
referenced-only
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
referenced-only
Learning phrase representations using RNN encoder-decoder for statistical machine translation
referenced-only
Understanding neural networks through representation erasure
referenced-only
Identifying and controlling important neurons in neural machine translation
referenced-only
An interpretability illusion for bert
referenced-only
Locating and editing factual associations in GPT
referenced-only
Toy models of superposition
referenced-only
Dissecting recall of factual associations in auto-regressive language models
referenced-only
Discovering variable binding circuitry with desiderata
referenced-only
A geometric notion of causal probing
referenced-only
The geometry of truth: Emergent linear structure in large language model representations of true/false datasets
referenced-only
In-context learning dynamics with random binary sequences
referenced-only
Manifold learning: what, how, and why
referenced-only
Diffusion geometry
referenced-only
Not all language model features are one-dimensionally linear
referenced-only
Recurrent neural networks learn to store and generate sequences using non-linear representations
referenced-only
π0: A vision-language-action flow model for general robot control
referenced-only
Emergent symbol-like number variables in artificial neural networks
referenced-only
Language models use trigonometry to do addition
referenced-only
Steering off course: Reliability challenges in steering language models
referenced-only
On linear representations and pretraining data frequency in language models
referenced-only
Patterns and mechanisms of contrastive activation engineering
referenced-only
The origins of representation manifolds in large language models
referenced-only
On the emergence of linear analogies in word embeddings
referenced-only
Persona vectors: Monitoring and controlling character traits in language models
referenced-only
Semantic structure in large language model embeddings
referenced-only
Into the rabbit hull: From task-relevant concepts in dino to minkowski geometry
referenced-only
Belief dynamics reveal the dual nature of in-context learning and activation steering
referenced-only

Mentions (3)

papers-typed
steering.md
papers
/Users/antonborzov/Documents/Research.nosync/papers/wurgaft-goodfire-manifold-steering-2026.md
papers
wurgaft-goodfire-manifold-steering-2026-fulltext.md

External IDs

Outgoing (4)

Contradicts (1)

Extends (1)

Implements (1)

Member of (1)

Incoming (19)

Authored by (16)

Supported by (3)

References (30)

Mentions (3)