framework
active
framework:representation-engineeringRepresentation Engineering
A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework
Neighborhood — ranked by edge-count
Papers (4)
paper
Thinkers (2)
thinker
- Andy ZouintroducesstudiesLead author of Representation Engineering paper establishing RepE paradigm
- Alexander Matt TurnerstudiesLead author of Activation Engineering paper; foundational for additive steering paradigm
Concepts (1)
concept
- Endogenous Steering ResistancecontradictsThe central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
Frameworks (4)
framework
- The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
- The central framework proposed in this paper: aligning AI internal representations of self and others to reduce deceptive behavior
- ReflCtrlimplementsThe proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering
- The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The aspect of design dealing with data structures, modules, and implementation.
- Key prior work on representation engineering that ReflCtrl directly extends
- Survey of representation engineering methods cited as related work
- One-dimensional curved surface in internal activation space; the paper demonstrates alignment with behavior manifold.
- How a neural network encodes a semantic concept internally, argued to be better captured by manifolds than by atomic features.
- CIMC's characterization of part of the solution to the Hard Problem: insight into the structural necessities of phenomenal representation
- The central question of whether representational geometry implies corresponding computational structure
- The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.