Psychological Steering Framework

The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts

Neighborhood — ranked by edge-count

Papers (1)

paper

Psychological Steering of Large Language Models
introduces

Methods (5)

method

MDS Injection
uses
Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
Embedding-based Construct Logistic Classifier
uses
Logistic regressor on Qwen3Embedding-0.6B embeddings trained on construct statements; used to measure construct presence in alpha sweeps
Unbounded Alpha Sweep
uses
Procedure sweeping injection coefficient alpha in integer centroid-unit steps with early stopping on nonfluency to find optimal settings
Centroid Unit Calibration
uses
Novel calibration of injection strength as the distance from centroid midpoint to centroid; enables meaningful cross-layer comparison of alpha values
RoBERTa-large CoLA Fluency Classifier
uses
RoBERTa-large model trained on Corpus of Linguistic Acceptability used to score 0-to-1 fluency of generated text

Frameworks (1)

framework

Representation Engineering
extends
A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The psychological steering framework generalizes beyond OCEAN to Dark Tetrad, CMNI, CFNI, and other psychological modelsclaim0.817
Supported by qualitative experiments showing fluent and coherent steering for three additional models
Geometry-Aware Steering Frameworkframework0.761
The overarching theoretical framework proposed in the paper, asserting that steering interventions should be aligned with the geometric structure of the model's representation manifold.
Interpretability-Driven Feedback Steeringconcept0.742
Framework of using internal-state representations to control or steer generative models; conceptually parallel to manifold steering in language models.
What is the right geometry for enabling principled steering of neural network behavior?question0.740
The reframed steering problem the paper introduces
Sparse Autoencoder-based Framework for Steering Semantic Featuresframework0.739
The main framework proposed for retrieving and steering high-order semantic features in LLMs via sparse autoencoders.
What Ethical Frameworks And Relationships Should Guide Humanquestion0.738
direction-based steeringconcept0.738
Paradigm of finding the right direction in activation space (e.g., linear steering).
feeling-based steering methodmethod0.737
At each step, choose the action that most intensifies the feeling of the emerging whole.