framework
active
framework:representation-engineering

Representation Engineering

A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework

Neighborhood — ranked by edge-count

Thinkers (2)

thinker
  • Andy Zou
    introducesstudies
    Lead author of Representation Engineering paper establishing RepE paradigm
  • Lead author of Activation Engineering paper; foundational for additive steering paradigm

Concepts (1)

concept
  • The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs

Frameworks (4)

framework
  • The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
  • The central framework proposed in this paper: aligning AI internal representations of self and others to reduce deceptive behavior
  • ReflCtrl
    implements
    The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering
  • The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.