framework
active
framework:psychological-steering-frameworkPsychological Steering Framework
The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (5)
method
- MDS InjectionusesMean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
- Logistic regressor on Qwen3Embedding-0.6B embeddings trained on construct statements; used to measure construct presence in alpha sweeps
- Procedure sweeping injection coefficient alpha in integer centroid-unit steps with early stopping on nonfluency to find optimal settings
- Novel calibration of injection strength as the distance from centroid midpoint to centroid; enables meaningful cross-layer comparison of alpha values
- RoBERTa-large model trained on Corpus of Linguistic Acceptability used to score 0-to-1 fluency of generated text
Frameworks (1)
framework
- Representation EngineeringextendsA class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Supported by qualitative experiments showing fluent and coherent steering for three additional models
- The overarching theoretical framework proposed in the paper, asserting that steering interventions should be aligned with the geometric structure of the model's representation manifold.
- Framework of using internal-state representations to control or steer generative models; conceptually parallel to manifold steering in language models.
- What is the right geometry for enabling principled steering of neural network behavior?question0.740The reframed steering problem the paper introduces
- The main framework proposed for retrieving and steering high-order semantic features in LLMs via sparse autoencoders.
- Paradigm of finding the right direction in activation space (e.g., linear steering).
- At each step, choose the action that most intensifies the feeling of the emerging whole.