concept

active

concept:representation-engineering-a-top-down-approach-to-ai-transparency-zou-et-al-2023

Representation engineering: A top-down approach to AI transparency (Zou et al., 2023)

Key prior work on representation engineering that ReflCtrl directly extends

Neighborhood — ranked by edge-count

paper

concept

Zou et al. 2023 - Representation Engineering: A Top-Down Approach to AI Transparency
same_as
Framework paper describing the broader class of methods within which SOO fine-tuning fits

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Representational Transparencyconcept0.803
Property of conscious representations: they do not contain information about the fact that they are representations at the level of the representation itself
Representation Engineeringframework0.795
A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework
Representation engineering for large-language models: Survey and research challenges (Bartoszcze et al., 2025)concept0.794
Survey of representation engineering methods cited as related work
Representation engineering successfully quantifies deception via high-accuracy steering vectors, establishing it as a measurable property of model representationsclaim0.767
Key interpretive claim that deception has a tractable geometric signature in activation space
Representational abstraction of truth may emerge more clearly with model scaleclaim0.761
Interpretation of weaker PCA separation and lower ASR in smaller models
Representation Designconcept0.757
The aspect of design dealing with data structures, modules, and implementation.
Bai et al. 2022: Constitutional AI — harmlessness from AI feedbackconcept0.753
Paper on AI-feedback fine-tuning as alternative to human-feedback RLHF; cited as ref 20
The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'.quote0.752
Defines the core concept of the paper.