framework
active
framework:psychological-steering-framework

Psychological Steering Framework

The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts

Neighborhood — ranked by edge-count

Methods (5)

method
  • Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
  • Logistic regressor on Qwen3Embedding-0.6B embeddings trained on construct statements; used to measure construct presence in alpha sweeps
  • Procedure sweeping injection coefficient alpha in integer centroid-unit steps with early stopping on nonfluency to find optimal settings
  • Novel calibration of injection strength as the distance from centroid midpoint to centroid; enables meaningful cross-layer comparison of alpha values
  • RoBERTa-large model trained on Corpus of Linguistic Acceptability used to score 0-to-1 fluency of generated text

Frameworks (1)

framework
  • A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.