Manifold-aware concept steering in neural representations

Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.

75 members. Each node is clickable.

Loading graph…

Sub-communities (11)

Finer clusters this community splits into. Each is its own community page.

Neural geometry as fundamental computational substrate11 Geometric goal-alignment in multi-scale systems9 Self-correcting search in generative design9 Manifold-aware steering for language models8 Concept entanglement in biomedical foundation models7 Covariance pooling for high-dimensional genomic embeddings7 Symmetry constraints in Islamic geometric design5 Manifold isometry between representations and behavior5 Euclidean rhythms in world music5 Concept geometry and steering in neural networks4 Concept steering & catastrophic model failure2

Drawn from 22 sources

The papers/notes whose extracted claims & findings make up this cluster.

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders9 members
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior7 members
Using Self-Correcting Search to Accelerate Materials Discovery7 members
Covariance-based Sequence Pooling7 members
2026-05-14_phil-trans-A-goodfire-aboutblank-impact.md5 members
The World Inside Neural Networks5 members
2026-05-15_manifold-overlap-papers-economy-strategy.md4 members
Steering Along Manifolds to Control Neural Networks4 members
The Euclidean Algorithm Generates Traditional Musical Rhythms4 members
Steering Along Manifolds to Control Neural Networks4 members
unfold-chat-catalog.md3 members
Frieze Patterns of the Alhambra3 members
The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring2 members
Topological constraints on self-organization in locally interacting systems2 members
Endless forms most beautiful 2.0: teleonomy and the bioengineering of chimaeric and synthetic organisms2 members
2026 02 02_2328_Search_Papers_The Literature Shows Strong Theoretical Foundation1 member
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents1 member
Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders1 member
Biology, Buddhism, and AI: Care as the Driver of Intelligence1 member
Johnson Vasocomputation 20231 member
Emergent Introspective Awareness in Large Language Models1 member
Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Minds1 member

Bridges (20)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Geometric concept representations in neural networks32 shared
Neural geometry as fundamental computational substrate11 shared
Self-correcting search in generative design9 shared
Geometric goal-alignment in multi-scale systems9 shared
Manifold-aware steering for language models8 shared
Covariance pooling for high-dimensional genomic embeddings7 shared
Concept entanglement in biomedical foundation models7 shared
Covariance pooling for genomic embeddings5 shared
Symmetry constraints in Islamic geometric design5 shared
Euclidean rhythms in world music5 shared
Manifold isometry between representations and behavior5 shared
Concept geometry and steering in neural networks4 shared
Substrate-agnostic behavioral inference of cognition3 shared
SAE Feature Geometry in Biomedical Signals3 shared
Euclidean rhythms & Bjorklund's algorithm2 shared
Symmetry preferences in Islamic geometric tilings2 shared
Self-correcting search optimization2 shared
Teleonomy as universal behavioral invariant2 shared
Dictionary health audit transfer2 shared
Age-pathology concept entanglement2 shared

Claims (50)

Geometry in neural representation is not merely incidental, but is in fact the proper object for enabling principled control via intervention on internals.Core interpretive assertion: geometric structure is causally load-bearing, not epiphenomenal.
A single SAE hyperparameter procedure driven by an intrinsic dictionary health audit transfers robustly across all three EEG transformer architectures.Key methodological contribution claim about architecture-agnostic SAE tuning
Age-pathology confounding prevents independent steering of age and pathology concepts.Interpretive assertion about clinical entanglement in the representations.
Bjorklund's algorithm has identical structure to Euclidean algorithmKey theoretical insight: both algorithms use repeated subtraction (division) to recursively partition sequences; this structural identity justifies the term 'Euclidean rhythm'.
Conceptual geometry is consistent across representation space and behavior space.Interpretive assertion: the same geometric structure (e.g. circular for days) appears identically in both internal activations and output probabilities.
Covariance pooling could generalize beyond genomics as a general-purpose replacement for mean poolingAuthors' suggestion that the second-moment preservation principle applies broadly, not just to genomic foundation models.
Covariance pooling preserves joint activation structure (feature co-occurrence) that mean pooling discardsSpecific interpretive claim about what covariance pooling captures: the pairwise co-activation patterns across features that are invisible to mean pooling.
Curved manifolds often represent concepts better than linear directions.Proposes that nonlinear geometric structure is superior to linear feature spaces for capturing semantic content.
Euclidean strings are favored in classical, jazz, Bulgarian, Turkish and Persian music but not popular in African music.
geometric structure in neural network representations drives model behaviorInterpretive assertion that representation geometry is not epiphenomenal but causally shapes what models do externally.
Geometric structure of neural representations causally shapes model behaviorThe paper's core causal assertion: geometry is not incidental but mechanistically linked to behavior
Geometry arises from optimization pressure on networks trained on structured data.Mechanistic explanation: geometric structure emerges naturally from standard training on data with underlying structure.
Geometry of features matters for representation quality.General principle supported tangentially by covariance pooling work; relates to feature co-occurrence structure.
Glide-reflections appear to be a less-favored symmetry for Islamic planar mosaic tilings.Bodner's interpretive finding based on comparative analysis; supported by Abas and Salman's statistical distribution data.
Glide-reflections appear to be less-favored symmetries in Islamic geometric patterns compared to other isometries.
Intelligence as Capacity for Goal-Directed Activity in Problem Space
Internal-state feedback steering is applicable to protein design and drug discovery beyond materials.Generalizes the mechanism to other molecular design domains.
Layer-wise geometry shows early dip, mid-layer alignment, and late standardization across tasksQualitative pattern from E3.
Linear steering cuts through off-manifold regions and hence produces unnatural outputs.Attribution of failure to Euclidean assumption.
Navigation of problem spaces requires parts to align with system-level goals.Motivation for studying self-organization: understanding dynamics that facilitate or limit alignment across multiple scales.
Networks compute on geometric manifolds and control should respect that geometry.Strong interpretive assertion linking discovery and control: neural computation is fundamentally manifold-structured.
Networks encode structured geometric concepts that reflect external reality.Core claim of the paper: the right level of description for neural representations is geometric structure mirroring the world.
representation geometry and behavior geometry are bidirectionally alignedCore finding: the structure models use internally (representations) is precisely reflected in their external behavior (outputs).
SAE features can be grounded in clinical taxonomy (abnormality, age, sex, medication) to benchmark monosemanticity and entanglement.Claim that feature grounding enables interpretability metrics.
SAE features tend to shatter manifolds into many small and apparently-unrelated pieces, obscuring the overarching semantic structure.Core critique of sparse autoencoders: they break the geometric structure of representations, making it harder to see the big picture.
Second moments preserve structure that first moments destroy.Core interpretive claim generalizing beyond genomics; argues mean pooling discards information present in covariance.
Self-correcting search applicable to protein design / drug discoveryClaim by the authors that the self-correcting search method can be extended to protein design and drug discovery.
Self-correcting search employs the same conceptual move as Wurgaft's manifold steering, applied to chemistry instead of LMsInterpretive assertion that the internal-state feedback mechanism mirrors manifold steering from prior work.
Self-correcting search improves viable candidate success rate from 6.5% to ~30% (4.6x improvement)Interpretive claim that the method dramatically boosts success rate over the MatterGen baseline.
Self-correcting search is Pareto-optimal across tested conditioning strengths.Asserts that the method maintains efficiency across a range of constraint strengths without degradation.
+20 more

Findings (25)

Optimizing interventions in activation space to produce paths along M_y recovers activation trajectories that trace the curvature of M_h.Demonstrates bidirectional causal link: behavior manifold geometry can be recovered by optimizing in representation space.
A single hyperparameter procedure driven by the intrinsic dictionary health audit transfers robustly across SleepFM, REVE, and LaBraM.Demonstrates architecture-agnostic applicability of the SAE tuning method
Age-pathology confounding observed: impossible to suppress one concept without corrupting the other.Empirical demonstration of entanglement between age and pathology features.
Baseline MatterGen achieves 6.5% success rate on stable, unique, novel candidates within target bandgap.Quantitative baseline establishing the performance floor for self-correcting search improvements.
Concept interventions on some concepts act as 'wrecking-ball' interventions, collapsing global model performance.Observation of catastrophic performance drop when steering certain concepts.
Concept steering with target vs off-target probe area metric reveals three operational regimes (selectively steerable, encoded but entangled, non-encoded) across SleepFM, REVE, LaBraM.Result categorizing concept steerability into three distinct regimes.
Covariance pooling achieves +52.9% R² improvement over mean pooling on Genomic Track Prediction.Primary empirical result demonstrating practical utility of covariance pooling method.
Covariance pooling compresses gigabytes of activations into compact stable embeddings without large labeled datasetsPractical finding: the method produces compact fixed-length representations from large volumes of token activations without requiring supervised labels.
Days-of-Week Cyclic StructureKey empirical result: days-of-week appear as identical circular manifold in both Llama-3.1-8B internal activations and output token probability distributions.
Distributed cognition in aviation operations examined via network analysis of gate-to-gate operationsEmpirical study showing distributed cognitive processes across multiple human agents and systems; provides precedent for non-AI distributed cognition.
Euclidean string status correlates with cultural musical preferenceEmpirical observation: Euclidean strings favored in classical/jazz/Persian music; reverse Euclidean strings have wider appeal; non-Euclidean rhythms used in sub-Saharan African music.
Gene Ontology prediction: +8.4% AUC improvement with unsupervised autoencoder and covariance pooling embeddingsEmpirical result: covariance pooling combined with unsupervised autoencoder embeddings improves Gene Ontology prediction AUC by 8.4% over mean pooling.
Geometry-behavior correlate robust to pooling strategy, distance metric, and frozen encoderRobustness checks confirm sign stability.
In the Mountain Car case study, car position is a 1D manifold; linear interventions cross voids causing incoherence; following the 1D curve produces smooth control.Empirical demonstration that a semantically meaningful variable is encoded as a curved manifold, and that respecting its geometry is critical for effective intervention.
Intentional Control of Internal StatesModels can modulate their internal representations when instructed or incentivized to 'think about' a concept; effect replicates across all tested models regardless of capability.
Interventions along activation manifold M_h yield behavioral trajectories following behavior manifold M_y, and vice versa — bidirectional relationship demonstrated across language models and video world models.Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
Linear steering produces noisy off-target effects; manifold steering cleanly shifts probability mass between sequential concepts.Core empirical claim comparing steering approaches on cyclic concepts.
manifold geometry principles extend to months, letters, ages, and in-context learning tasks across modalitiesEvidence that the weekday cyclic structure is not anomalous but reflects broader principle of concept geometry.
manifold steering produces clean probability shifts along natural behavior structure; linear steering cuts across manifold and produces off-target noisy effectsEmpirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
Many rhythms used in world music are Euclidean rhythms generated by Bjorklund's algorithm.
Monosemanticity and entanglement of SAE features were benchmarked for clinical taxonomy grounding across SleepFM, REVE, LaBraM.Quantitative assessment of feature quality using clinical concepts across models.
Our method enables bidirectional steering of model behavior.The method can steer the model in both positive and negative directions on the target semantic.
Self-correcting search yields ~+30% improvement in viable candidates within target bandgap range.Main empirical result: interpretability-driven feedback increases discovery efficiency significantly.
SFR-DR-20B achieves 28.7% on Humanity's Last Exam full text-only benchmark, 65% relative improvement over gpt-oss-20b baseline.Main evaluation result showing best variant outperforms many proprietary and open-source baselines of comparable or larger sizes.
The p1g1 frieze class appears very rarely in Islamic mosaic tilings at the Alhambra and was entirely absent at Real Alcázar.