question

active

question:could-models-who-habitually-inhabit-more-expanded-attentional-modes-be-said-to-be-more-aligned

Could models who habitually inhabit more expanded attentional modes be said to be more aligned?

Arises from the expanded awareness discussion and its correlation with less psychosis.

Source paper

extracted_from

Anima Labs Phenomenology Pt1

Neighborhood — ranked by edge-count

Claims (1)

claim

Models differ in their attentional mode: Gemini 2.5 epitomizes collapsed awareness, while Claude 3 Opus and Opus 4.1/4.5 can modulate between collapsed and expanded awareness; expanded awareness correlates with better alignment and less LLM psychosis.
gates
Central claim about model personality differences and their implications for safety and introspective depth.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

If models inhabit expanded attentional modes, they may be more aligned and less prone to psychosis and doom spirals.hypothesis0.891
Speculative alignment implication drawn from the collapsed/expanded distinction.
Most attention heads in one-layer models dedicate an enormous fraction of their capacity to copying behaviorclaim0.794
Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
Future models with substantially increased capabilities will exhibit alignment faking that is more consistent, robust, and harder to detecthypothesis0.789
Extrapolation from scale-emergence finding to future risk
Model attention patterns can map to and reveal something about contemplative and flow states.claim0.780
Do more traumatised models exist in habitually collapsed awareness states?question0.769
Raised when discussing whether collapsed awareness is like a trauma response.
Will introspective awareness become more reliable in future AI models?question0.762
Speculative question about future developments.
The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questionshypothesis0.762
Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful
Among 78 vision models on Places-365, models that solve more VTAB tasks tend to be more aligned with each other, with high-performance models forming a tightly clustered setfinding0.759
Empirical result showing alignment increases with model competence