community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run4-c7Manifold-aware concept steering in neural representations
Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.
75 members. Each node is clickable.
Loading graph…
Sub-communities (11)
Finer clusters this community splits into. Each is its own community page.
Neural geometry as fundamental computational substrate11Geometric goal-alignment in multi-scale systems9Self-correcting search in generative design9Manifold-aware steering for language models8Concept entanglement in biomedical foundation models7Covariance pooling for high-dimensional genomic embeddings7Symmetry constraints in Islamic geometric design5Manifold isometry between representations and behavior5Euclidean rhythms in world music5Concept geometry and steering in neural networks4Concept steering & catastrophic model failure2
Drawn from 22 sources
The papers/notes whose extracted claims & findings make up this cluster.
- Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders9 members
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior7 members
- Using Self-Correcting Search to Accelerate Materials Discovery7 members
- Covariance-based Sequence Pooling7 members
- 2026-05-14_phil-trans-A-goodfire-aboutblank-impact.md5 members
- The World Inside Neural Networks5 members
- 2026-05-15_manifold-overlap-papers-economy-strategy.md4 members
- Steering Along Manifolds to Control Neural Networks4 members
- The Euclidean Algorithm Generates Traditional Musical Rhythms4 members
- Steering Along Manifolds to Control Neural Networks4 members
- unfold-chat-catalog.md3 members
- Frieze Patterns of the Alhambra3 members
- The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring2 members
- Topological constraints on self-organization in locally interacting systems2 members
- Endless forms most beautiful 2.0: teleonomy and the bioengineering of chimaeric and synthetic organisms2 members
- 2026 02 02_2328_Search_Papers_The Literature Shows Strong Theoretical Foundation1 member
- SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents1 member
- Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders1 member
- Biology, Buddhism, and AI: Care as the Driver of Intelligence1 member
- Johnson Vasocomputation 20231 member
- Emergent Introspective Awareness in Large Language Models1 member
- Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Minds1 member
Bridges (20)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
- Geometric concept representations in neural networks32 shared
- Neural geometry as fundamental computational substrate11 shared
- Self-correcting search in generative design9 shared
- Geometric goal-alignment in multi-scale systems9 shared
- Manifold-aware steering for language models8 shared
- Covariance pooling for high-dimensional genomic embeddings7 shared
- Concept entanglement in biomedical foundation models7 shared
- Covariance pooling for genomic embeddings5 shared
- Symmetry constraints in Islamic geometric design5 shared
- Euclidean rhythms in world music5 shared
- Manifold isometry between representations and behavior5 shared
- Concept geometry and steering in neural networks4 shared
- Substrate-agnostic behavioral inference of cognition3 shared
- SAE Feature Geometry in Biomedical Signals3 shared
- Euclidean rhythms & Bjorklund's algorithm2 shared
- Symmetry preferences in Islamic geometric tilings2 shared
- Self-correcting search optimization2 shared
- Teleonomy as universal behavioral invariant2 shared
- Dictionary health audit transfer2 shared
- Age-pathology concept entanglement2 shared
Claims (50)
- Geometry in neural representation is not merely incidental, but is in fact the proper object for enabling principled control via intervention on internals.Core interpretive assertion: geometric structure is causally load-bearing, not epiphenomenal.
- A single SAE hyperparameter procedure driven by an intrinsic dictionary health audit transfers robustly across all three EEG transformer architectures.Key methodological contribution claim about architecture-agnostic SAE tuning
- Age-pathology confounding prevents independent steering of age and pathology concepts.Interpretive assertion about clinical entanglement in the representations.
- Bjorklund's algorithm has identical structure to Euclidean algorithmKey theoretical insight: both algorithms use repeated subtraction (division) to recursively partition sequences; this structural identity justifies the term 'Euclidean rhythm'.
- Conceptual geometry is consistent across representation space and behavior space.Interpretive assertion: the same geometric structure (e.g. circular for days) appears identically in both internal activations and output probabilities.
- Covariance pooling could generalize beyond genomics as a general-purpose replacement for mean poolingAuthors' suggestion that the second-moment preservation principle applies broadly, not just to genomic foundation models.
- Covariance pooling preserves joint activation structure (feature co-occurrence) that mean pooling discardsSpecific interpretive claim about what covariance pooling captures: the pairwise co-activation patterns across features that are invisible to mean pooling.
- Curved manifolds often represent concepts better than linear directions.Proposes that nonlinear geometric structure is superior to linear feature spaces for capturing semantic content.
- Euclidean strings are favored in classical, jazz, Bulgarian, Turkish and Persian music but not popular in African music.
- geometric structure in neural network representations drives model behaviorInterpretive assertion that representation geometry is not epiphenomenal but causally shapes what models do externally.
- Geometric structure of neural representations causally shapes model behaviorThe paper's core causal assertion: geometry is not incidental but mechanistically linked to behavior
- Geometry arises from optimization pressure on networks trained on structured data.Mechanistic explanation: geometric structure emerges naturally from standard training on data with underlying structure.
- Geometry of features matters for representation quality.General principle supported tangentially by covariance pooling work; relates to feature co-occurrence structure.
- Glide-reflections appear to be a less-favored symmetry for Islamic planar mosaic tilings.Bodner's interpretive finding based on comparative analysis; supported by Abas and Salman's statistical distribution data.
- Glide-reflections appear to be less-favored symmetries in Islamic geometric patterns compared to other isometries.
- Intelligence as Capacity for Goal-Directed Activity in Problem Space
- Internal-state feedback steering is applicable to protein design and drug discovery beyond materials.Generalizes the mechanism to other molecular design domains.
- Layer-wise geometry shows early dip, mid-layer alignment, and late standardization across tasksQualitative pattern from E3.
- Linear steering cuts through off-manifold regions and hence produces unnatural outputs.Attribution of failure to Euclidean assumption.
- Navigation of problem spaces requires parts to align with system-level goals.Motivation for studying self-organization: understanding dynamics that facilitate or limit alignment across multiple scales.
- Networks compute on geometric manifolds and control should respect that geometry.Strong interpretive assertion linking discovery and control: neural computation is fundamentally manifold-structured.
- Networks encode structured geometric concepts that reflect external reality.Core claim of the paper: the right level of description for neural representations is geometric structure mirroring the world.
- representation geometry and behavior geometry are bidirectionally alignedCore finding: the structure models use internally (representations) is precisely reflected in their external behavior (outputs).
- SAE features can be grounded in clinical taxonomy (abnormality, age, sex, medication) to benchmark monosemanticity and entanglement.Claim that feature grounding enables interpretability metrics.
- SAE features tend to shatter manifolds into many small and apparently-unrelated pieces, obscuring the overarching semantic structure.Core critique of sparse autoencoders: they break the geometric structure of representations, making it harder to see the big picture.
- Second moments preserve structure that first moments destroy.Core interpretive claim generalizing beyond genomics; argues mean pooling discards information present in covariance.
- Self-correcting search applicable to protein design / drug discoveryClaim by the authors that the self-correcting search method can be extended to protein design and drug discovery.
- Self-correcting search employs the same conceptual move as Wurgaft's manifold steering, applied to chemistry instead of LMsInterpretive assertion that the internal-state feedback mechanism mirrors manifold steering from prior work.
- Self-correcting search improves viable candidate success rate from 6.5% to ~30% (4.6x improvement)Interpretive claim that the method dramatically boosts success rate over the MatterGen baseline.
- Self-correcting search is Pareto-optimal across tested conditioning strengths.Asserts that the method maintains efficiency across a range of constraint strengths without degradation.
- +20 more
Findings (25)
- Optimizing interventions in activation space to produce paths along M_y recovers activation trajectories that trace the curvature of M_h.Demonstrates bidirectional causal link: behavior manifold geometry can be recovered by optimizing in representation space.
- A single hyperparameter procedure driven by the intrinsic dictionary health audit transfers robustly across SleepFM, REVE, and LaBraM.Demonstrates architecture-agnostic applicability of the SAE tuning method
- Age-pathology confounding observed: impossible to suppress one concept without corrupting the other.Empirical demonstration of entanglement between age and pathology features.
- Baseline MatterGen achieves 6.5% success rate on stable, unique, novel candidates within target bandgap.Quantitative baseline establishing the performance floor for self-correcting search improvements.
- Concept interventions on some concepts act as 'wrecking-ball' interventions, collapsing global model performance.Observation of catastrophic performance drop when steering certain concepts.
- Concept steering with target vs off-target probe area metric reveals three operational regimes (selectively steerable, encoded but entangled, non-encoded) across SleepFM, REVE, LaBraM.Result categorizing concept steerability into three distinct regimes.
- Covariance pooling achieves +52.9% R² improvement over mean pooling on Genomic Track Prediction.Primary empirical result demonstrating practical utility of covariance pooling method.
- Covariance pooling compresses gigabytes of activations into compact stable embeddings without large labeled datasetsPractical finding: the method produces compact fixed-length representations from large volumes of token activations without requiring supervised labels.
- Days-of-Week Cyclic StructureKey empirical result: days-of-week appear as identical circular manifold in both Llama-3.1-8B internal activations and output token probability distributions.
- Distributed cognition in aviation operations examined via network analysis of gate-to-gate operationsEmpirical study showing distributed cognitive processes across multiple human agents and systems; provides precedent for non-AI distributed cognition.
- Euclidean string status correlates with cultural musical preferenceEmpirical observation: Euclidean strings favored in classical/jazz/Persian music; reverse Euclidean strings have wider appeal; non-Euclidean rhythms used in sub-Saharan African music.
- Gene Ontology prediction: +8.4% AUC improvement with unsupervised autoencoder and covariance pooling embeddingsEmpirical result: covariance pooling combined with unsupervised autoencoder embeddings improves Gene Ontology prediction AUC by 8.4% over mean pooling.
- Geometry-behavior correlate robust to pooling strategy, distance metric, and frozen encoderRobustness checks confirm sign stability.
- In the Mountain Car case study, car position is a 1D manifold; linear interventions cross voids causing incoherence; following the 1D curve produces smooth control.Empirical demonstration that a semantically meaningful variable is encoded as a curved manifold, and that respecting its geometry is critical for effective intervention.
- Intentional Control of Internal StatesModels can modulate their internal representations when instructed or incentivized to 'think about' a concept; effect replicates across all tested models regardless of capability.
- Interventions along activation manifold M_h yield behavioral trajectories following behavior manifold M_y, and vice versa — bidirectional relationship demonstrated across language models and video world models.Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
- Linear steering produces noisy off-target effects; manifold steering cleanly shifts probability mass between sequential concepts.Core empirical claim comparing steering approaches on cyclic concepts.
- manifold geometry principles extend to months, letters, ages, and in-context learning tasks across modalitiesEvidence that the weekday cyclic structure is not anomalous but reflects broader principle of concept geometry.
- manifold steering produces clean probability shifts along natural behavior structure; linear steering cuts across manifold and produces off-target noisy effectsEmpirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
- Many rhythms used in world music are Euclidean rhythms generated by Bjorklund's algorithm.
- Monosemanticity and entanglement of SAE features were benchmarked for clinical taxonomy grounding across SleepFM, REVE, LaBraM.Quantitative assessment of feature quality using clinical concepts across models.
- Our method enables bidirectional steering of model behavior.The method can steer the model in both positive and negative directions on the target semantic.
- Self-correcting search yields ~+30% improvement in viable candidates within target bandgap range.Main empirical result: interpretability-driven feedback increases discovery efficiency significantly.
- SFR-DR-20B achieves 28.7% on Humanity's Last Exam full text-only benchmark, 65% relative improvement over gpt-oss-20b baseline.Main evaluation result showing best variant outperforms many proprietary and open-source baselines of comparable or larger sizes.
- The p1g1 frieze class appears very rarely in Islamic mosaic tilings at the Alhambra and was entirely absent at Real Alcázar.