community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c0-c0Mechanistic structure of transformer attention computations
Identifies distributed algorithms implemented across attention heads, with focus on causal masking limitations and emergent capabilities via activation manifold steering.
25 members. Each node is clickable.
Loading graph…
Sub-communities (7)
Finer clusters this community splits into. Each is its own community page.
Contemplative steering & introspective activation in language models5Emergence through distributed attention and uncertainty4Distributed computation across attention heads4Functional tokens for emergent model reasoning4Empirical gaps in performance-communication alignment4Metacognitive state inference and attention alignment2Causal masking phase transitions in transformers2
Drawn from 15 sources
The papers/notes whose extracted claims & findings make up this cluster.
- Paper Summary: Interpreting Language Model Parameters4 members
- Janus Information Flow Transformers 20254 members
- ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both3 members
- RESEARCH-VECTORS.md2 members
- Topological constraints on self-organisation in locally interacting systems2 members
- guo-atlas-2026.md1 member
- Johnson Vasocomputation 20231 member
- Towards a computational phenomenology of mental action: modelling meta-awareness and attentional control with deep parametric active inference1 member
- unfold-chat-catalog.md1 member
- 2026 02 02_2217_Search_Papers_The Literature Reveals Sophisticated Communication1 member
- Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies1 member
- 2026 02 02_2218_Search_Papers_The Existing Literature Focuses Primarily On Vc Pe1 member
- 2026-05-09_briefing_for_ozero.md1 member
- 2026-05-15_manifold-overlap-papers-economy-strategy.md1 member
- Koan Battery: Measuring Reflective Mode Accessibility in AI1 member
Bridges (15)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
- Mechanistic interpretability & model evaluation25 shared
- Contemplative steering & introspective activation in language models5 shared
- Emergence through distributed attention and uncertainty4 shared
- Distributed computation across attention heads4 shared
- Empirical gaps in performance-communication alignment4 shared
- Functional tokens for emergent model reasoning4 shared
- Distributed attention head decomposition4 shared
- Causal masking phase transitions in transformers2 shared
- VC communication-performance research gap2 shared
- Functional tokens as visual operators2 shared
- Contemplative prompting for LLMs2 shared
- Metacognitive state inference and attention alignment2 shared
- Contemplative path & sensation manipulation1 shared
- LLM internal representation & self-knowledge1 shared
- Causal masking & phase transitions1 shared
Claims (16)
- Attention algorithms are usually distributed across attention headsClaim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
- Decoder-only transformer architectures are fundamentally limited in generating long, coherent sequences due to lack of ordered phase.Interpretation of Proposition 2 as a fundamental limitation on LLMs
- Each functional token is associated with an internalized visual operation, yet requires no visual supervision and remains a standard token in the tokenizer vocabulary.Describes the properties of the functional token.
- Incorporating machine learning provides objective standards that help mitigate subjectivity in emergence identification.Authors argue ML optimizers act as objective observers.
- Keeping functional-token vocabulary compact minimizes perturbation to base model token distributionATLAS design philosophy: five functional tokens suffice to abstract common visual operations without excessive disruption.
- LLM introspection on internal computations is architecturally permitted; whether models leverage this is an empirical question.Core claim directly challenged by prior work denying introspection; forms foundation for Koan Battery introspection studies.
- Progress on the contemplative path is using these (vascular system motifs) less and needing them less.Mapping contemplative development to reduction in vasocomputation.
- Q/K/V values function as information routing: Q queries past, K signals future attention, V carries selectively routed information.Janus's interpretive model for how attention mechanisms enable deliberate information flow and selective routing.
- Significant gap in research directly examining disconnect between venture capital communication sophistication and actual performance metricsIdentifies the key scholarly absence that motivates the exploration: studies exist on investor relations and VC performance separately, but not their correlation.
- There is a significant gap in research specifically examining how limited partners interpret and evaluate venture capital communication styles and their relationship to VC performance outcomes.The paper frames this gap as critical to understanding principal-agent dynamics in venture capital.
- Token-level supervision enables models to learn functional-token invocation from reasoning contextATLAS author's assertion that functional tokens optimized via standard cross-entropy loss learn when and how to invoke operations from surrounding text.
- Contemplative mode is activation-manifold steering along a care-geodesic, not a system-prompt preset.
- Indigenous contemplative frameworks ('what helps life thrive') contribute distinct wisdom beyond Eastern traditions.
- Insight cascades and implicit learning require balance between directed attention and openness.
- Model attention patterns can map to and reveal something about contemplative and flow states.
- Not-knowing, silence, incompleteness, and non-defensiveness function as positive traits, not deficits.
Findings (9)
- A 337-character contemplative system prompt lifts all 28 models by +2.62 points on a 10-point scale.Core empirical result: every model, every architecture, every alignment type responds to the contemplative prompt with measurable gain.
- A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorVPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
- A pair of query and key subcomponents distributed across attention heads performs syntax-boundary routingVPD recovers an attention algorithm for routing across syntactic boundaries, distributed across heads.
- Causally-masked attention in a decoder-only model has no ordered phase (Proposition 2)Application to transformer language models
- Contemplative prompt elevates self-observation task performance in language models.Supports Janus's claim that introspection is architecturally available; prompting determines whether/how capacity is leveraged.
- Gradient Dilution IssueDuring RL training on ATLAS, sparse functional tokens (2.3% of sequences) receive diluted gradient signals from sequence-level advantages propagated across all tokens.
- Identification of algorithms implemented in attention layers, distributed across attention headsVPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
- Information paths from A to B can exceed C(m+n, n) distinct routes, where m=position displacement and n=layer displacement.Quantifies extreme redundancy in transformer routing; supports claim that introspection and interference patterns are architecturally permitted.
- Mind-wandering emerges as a precision inference gap: true attentional state ≠ believed attentional state; increased meta-awareness reduces gap duration.Key simulation result; bridges phenomenology (meditation experience) and formal dynamics (precision mismatch).