Representation Engineering

A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework

Neighborhood — ranked by edge-count

paper

thinker

Andy Zou
introducesstudies
Lead author of Representation Engineering paper establishing RepE paradigm
Alexander Matt Turner
studies
Lead author of Activation Engineering paper; foundational for additive steering paradigm

concept

Endogenous Steering Resistance
contradicts
The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs

framework

Linear Representation Hypothesis
cites
The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
Self-Other Overlap (SOO) Fine-Tuning
extends
The central framework proposed in this paper: aligning AI internal representations of self and others to reduce deceptive behavior
ReflCtrl
implements
The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering
Psychological Steering Framework
extends
The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Representation Designconcept0.827
The aspect of design dealing with data structures, modules, and implementation.
Representation engineering: A top-down approach to AI transparency (Zou et al., 2023)concept0.795
Key prior work on representation engineering that ReflCtrl directly extends
Representation engineering for large-language models: Survey and research challenges (Bartoszcze et al., 2025)concept0.791
Survey of representation engineering methods cited as related work
representation manifoldconcept0.788
One-dimensional curved surface in internal activation space; the paper demonstrates alignment with behavior manifold.
concept representationconcept0.784
How a neural network encodes a semantic concept internally, argued to be better captured by manifolds than by atomic features.
Representational Disentanglementconcept0.774
CIMC's characterization of part of the solution to the Hard Problem: insight into the structural necessities of phenomenal representation
Structure in representationsconcept0.774
The central question of whether representational geometry implies corresponding computational structure
Representational dynamicsconcept0.772
The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.