finding
active
finding:synthetic-mft-sjts-achieve-77-71-83-84-alignment-with-clifford-et-al-human-composed-mft-vignettesSynthetic MFT SJTs achieve 77.71%-83.84% alignment with Clifford et al. human-composed MFT vignettes
Moderate-to-high alignment validating SJT synthesis for moral foundations domain
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Synthetic HEXACO SJTs achieve 73.84%-85.45% alignment with Oostrom et al. human-composed HEXACO SJTsfinding0.863Moderate alignment validating SJT synthesis for HEXACO domain
- Synthetic SJTs achieve 82.97%-90.97% cosine similarity with Lee et al. TRAIT Dark Triad and OCEAN SJTsfinding0.832Highest SJT alignment among all validation comparisons
- Per-model steerability comparison from Table 4
- Mechanistic explanation for MDS superiority; attributed to two design choices: centroid alignment and full-utterance semantics in h_s
- Specific prediction linking IIT's prediction of high Φ for good performance to the experimental design's scoring structure.
- GPT-5.4 Nano TrueSkill rating
- SAE features are not simply mirroring individual neurons.
- MDS achieves global win proportion of 89.5% on SJTs across 14 LLMs and four injection stridesfinding0.738MDS dominates in open-ended generation by global win proportion metric (Table 2)