community

active

leiden_hybrid_concepts

label: sonnet

community:leiden_hybrid_concepts-run2-c3

Geometric concept representations in neural networks

Concepts encoded as curved manifolds and circular structures in LLM activation spaces.

38 members. Each node is clickable.

Loading graph…

Drawn from 14 sources

The papers/notes whose extracted claims & findings make up this cluster.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior9 members
2026-05-14_phil-trans-A-goodfire-aboutblank-impact.md6 members
feucht-goodfire-geometric-calculator-2026.md5 members
The World Inside Neural Networks4 members
Steering Along Manifolds to Control Neural Networks4 members
Steering Along Manifolds to Control Neural Networks3 members
Diagrammatic Writing3 members
2026-05-15_manifold-overlap-papers-economy-strategy.md3 members
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders3 members
Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders2 members
unfold-chat-catalog.md1 member
Covariance-based Sequence Pooling1 member
Cybernetic Diagrams: Design Strategies for an Open Game1 member
Diagrammatic Writing1 member

Bridges (15)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Manifold-aware concept steering in neural representations32 shared
Design principles for care-centered systems9 shared
Neural geometry as fundamental computational substrate9 shared
Manifold-aware steering for language models7 shared
Manifold isometry between representations and behavior5 shared
Concept geometry and steering in neural networks4 shared
Relational semantics through spatial and narrative representation3 shared
Geometric goal-alignment in multi-scale systems2 shared
Concept steering & catastrophic model failure2 shared
Causal emergence in biological systems1 shared
Hierarchical spatial organization in biology1 shared
Collective intelligence & distributed cognition1 shared
Substrate-independent minimal cognition1 shared
Concept entanglement in biomedical foundation models1 shared
Hierarchical structure and multiscale coherence in physical systems1 shared

Claims (29)

Geometry in neural representation is not merely incidental, but is in fact the proper object for enabling principled control via intervention on internals.Core interpretive assertion: geometric structure is causally load-bearing, not epiphenomenal.
A Diagram is an Image that Works
A diagram spatializes semantic value using graphic features of spatial organization to express relations
Conceptual geometry is consistent across representation space and behavior space.Interpretive assertion: the same geometric structure (e.g. circular for days) appears identically in both internal activations and output probabilities.
Curved manifolds often represent concepts better than linear directions.Proposes that nonlinear geometric structure is superior to linear feature spaces for capturing semantic content.
Diagrams can be understood as a strategy to deal with the structural complexity of the apparatus, enabling the creation of new games.Following Flusser’s challenge, the paper claims that cybernetic diagrams are meta-tools for designing new rule systems.
Diagrams spatialize semantic value into legible graphical systemsA diagram is an image that works by converting semantic relations into spatial organization, making meaning through form rather than content alone.
geometric structure in neural network representations drives model behaviorInterpretive assertion that representation geometry is not epiphenomenal but causally shapes what models do externally.
Geometric structure of neural representations causally shapes model behaviorThe paper's core causal assertion: geometry is not incidental but mechanistically linked to behavior
Geometry arises from optimization pressure on networks trained on structured data.Mechanistic explanation: geometric structure emerges naturally from standard training on data with underlying structure.
Geometry of features matters for representation quality.General principle supported tangentially by covariance pooling work; relates to feature co-occurrence structure.
Hierarchy and subordination through spatial organizationDrucker argues that indentation, size, placement, and relative position create hierarchies not as moral values but as relational effects within a system.
Linear steering cuts through off-manifold regions and hence produces unnatural outputs.Attribution of failure to Euclidean assumption.
Networks compute on geometric manifolds and control should respect that geometry.Strong interpretive assertion linking discovery and control: neural computation is fundamentally manifold-structured.
Networks encode structured geometric concepts that reflect external reality.Core claim of the paper: the right level of description for neural representations is geometric structure mirroring the world.
representation geometry and behavior geometry are bidirectionally alignedCore finding: the structure models use internally (representations) is precisely reflected in their external behavior (outputs).
Some SAE concept steering interventions act as 'wrecking balls' that collapse global model performance rather than selectively modifying target concepts.A critical failure mode identified in the paper demonstrating risk of naïve concept steering
The core problem of steering should be recast from finding the right direction to finding the right geometryThe paper's programmatic conclusion about how the field should reconceptualize neural network steering
There is a bidirectional relationship between the geometry of representation and behavior across tasks and modalities.Author’s interpretive claim that the shared geometry is general and robust.
Activation manifolds and behavior manifolds are approximately isometric across cyclic and sequential concepts.
Geometry unifies diverse neural architectures in machine learning systems.
Manifold-aware steering is genuinely new IP that frontier labs cannot ship as easily as assumed.
Manifold-aware steering is non-trivial IP requiring geometric analysis, not a system-prompt implementation.
Manifold-respecting steering produces smooth natural behavioral trajectories while linear steering teleports between non-adjacent concepts.
Neural networks compute cyclic concepts in generic substrate machinery (base-10 addition) not naturally cyclic computation.
Optimally steering model behavior requires isolating concept geometry and defining operators to navigate it.
Representation and computation can diverge; cyclic geometry is representational invariant while operations use generic substrate.
Representation geometry causally shapes behavior; activation and behavior manifolds are approximately isometric.
Sparse low-cardinality circuits implement competence; 0.2% of neurons handle shared computation across all cyclic tasks.

Findings (9)

Optimizing interventions in activation space to produce paths along M_y recovers activation trajectories that trace the curvature of M_h.Demonstrates bidirectional causal link: behavior manifold geometry can be recovered by optimizing in representation space.
Concept interventions on some concepts act as 'wrecking-ball' interventions, collapsing global model performance.Observation of catastrophic performance drop when steering certain concepts.
Concept steering with target vs off-target probe area metric reveals three operational regimes (selectively steerable, encoded but entangled, non-encoded) across SleepFM, REVE, LaBraM.Result categorizing concept steerability into three distinct regimes.
In the Mountain Car case study, car position is a 1D manifold; linear interventions cross voids causing incoherence; following the 1D curve produces smooth control.Empirical demonstration that a semantically meaningful variable is encoded as a curved manifold, and that respecting its geometry is critical for effective intervention.
Interventions along activation manifold M_h yield behavioral trajectories following behavior manifold M_y, and vice versa — bidirectional relationship demonstrated across language models and video world models.Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
Linear steering produces noisy off-target effects; manifold steering cleanly shifts probability mass between sequential concepts.Core empirical claim comparing steering approaches on cyclic concepts.
manifold geometry principles extend to months, letters, ages, and in-context learning tasks across modalitiesEvidence that the weekday cyclic structure is not anomalous but reflects broader principle of concept geometry.
manifold steering produces clean probability shifts along natural behavior structure; linear steering cuts across manifold and produces off-target noisy effectsEmpirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
Our method enables bidirectional steering of model behavior.The method can steer the model in both positive and negative directions on the target semantic.