AI
Interpretability, mechanistic understanding, model phenomenology, convergent representations, manifold steering.
Papers
97
+8
Thinkers
12
Frameworks
12
Methods
12
Claims
1,277
+1277
Findings
1,001
+1001
Hypotheses
256
Communities
12
New in this area (last 90d)
/recent ↗- Testing the Limits of Truth Directions in LLMs(2026)8h ago
- Steering Evaluation-Aware Language Models to Act Like They Are Deployed(2025)8h ago
- From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs(2025)8h ago
- Model Alignment Search(2025)8h ago
- pyvene: A Library for Understanding and Improving PyTorch Models via Interventions(2024)9h ago
- cimcWhitepaper6d ago
- The Machine Consciousness Hypothesis6d ago
- Differentiable Logic Cellular Automata: From Game of Life to pattern generation with learned recurrent circuits7d ago
Cross-area bridges (15)
also-inEntities that appear in this area AND ≥1 other area. The conceptual seams — usually the most fruitful place to look for new essay material.
- frameworkActive Inference
- frameworkBasal Cognition
- frameworkFree Energy Principle
- frameworkTame Technological Approach To Mind Everywhere
- thinkerMichael Levin
- conceptBioelectricity
- conceptCognitive Light Cone
- conceptCollective Intelligence
- conceptLlama-3.1-8B-Instruct
- conceptTeleonomy
- frameworkAttention Schema Theory
- frameworkAutopoiesis
- frameworkComputational Functionalism
- frameworkLinear Representation Hypothesis
- frameworkMultiscale Competency Architecture
Velocity movers (10)
new edges via area papersEntities whose connection to this area grew most in the window — new papers reinforcing existing thinkers / concepts / frameworks.
- thinkerMichael Levin+38
- thinkerAtticus Geiger+15
- thinkerKarl Friston+14
- thinkerChristopher Potts+12
- frameworkBasal Cognition+11
- communityLLM interpretability & self-awareness+10
- frameworkLinear Representation Hypothesis+10
- communityLLM Interpretability & Behavioral Analysis+10
- thinkerZhengxuan Wu+9
- conceptBioelectricity+8
Top entities in AI
Thinkers (12)
papersAuthors of area papers, by paper count
Frameworks (12)
linksFrameworks introduced or extended by area papers
Methods (12)
linksMethods used in area papers
Concepts (12)
linksConcepts referenced by area papers
- Bioelectricity8
- Collective Intelligence5
- Cognitive Light Cone4
- Manifold Steering4
- Teleonomy3
- Truth direction universality3
- Unconventional embodiments3
- Eval Awareness3
- Causal abstraction3
- Llama-3.1-8B-Instruct3
- Residual Stream3
- Zou et al. 2023 - Representation Engineering: A Top-Down Approach to AI Transparency3
Communities (12)
membersClusters with ≥2 area papers as members
- Mechanistic interpretability & model evaluation156
- Causal emergence in biological systems74
- Bioelectric morphogenesis & anatomical intelligence70
- Collective intelligence & distributed cognition58
- Design principles for care-centered systems50
- Few-shot anchoring & latent structure48
- Alive AI interface ethics & design42
- Manifold-aware concept steering in neural representations36
- LLM introspective awareness of injected concepts25
- Chain-of-Thought reasoning robustness & safety25
- Mechanistic interpretability via parameter decomposition22
- Mechanistic introspection in language models22
Top claims (10)
restatesClaims extracted from area papers, ranked by restate-degree
- All intelligence is collective intelligence: each of us consists of a huge number of cells working together to generate a coherent cognitive being with goals, preferences, and memories that belong to the whole and not to its parts.5
- All intelligences are collective intelligences — individual humans are collections of parts, competencies, drives, and tools both internal and external to the body.3
- All intelligences are collectives; individual intelligence arises from the interaction of many unintelligent components arranged in the right organisation.3
- Morphogenesis is a result of collective activity where cells cooperate toward a specific, invariant target morphology, exhibiting goal-directedness.3
- Morphogenesis is an instantiation of collective intelligence, exhibiting anatomical homeostasis and autonomous problem-solving in morphospace.3
- Newt kidney tubule cells, when artificially enlarged, can bend a single cell around itself to achieve correct tubule diameter, illustrating top-down control over molecular mechanisms.3
- All intelligence is collective intelligence, in the sense that it is made of parts that must align with respect to system-level goals.2
- All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals.2
- All intelligences are collectives and all individuals are collectives; individual and collective intelligence are not categorically distinct but unified by connectionist principles.2
- Tadpoles with ectopic eyes on their tails can see out of them, demonstrating radical functional plasticity without evolutionary adaptation.2
Top findings (10)
restatesFindings extracted from area papers, ranked by restate-degree
- Newt kidney tubule cells produce correct tubule diameter using fewer cells when cell size is enlarged; a single enlarged cell can loop to achieve the same diameter (Fankhauser 1945).3
- Single newt cell can wrap around itself to form a kidney tubule when cell size is artificially increased.3
- Tadpoles with ectopic eyes on tail can see and integrate sensory input from aberrant location3
- Ectopic eyes on tadpole tails support visual learning despite connecting to the spinal cord.2
- 17 of 83 tested emotions show significant association between self-eval transcript word mention and cosine similarity to emotion probe1
- 62% of emotions significantly elevated at 5 tokens after steering pulse ends1
- A small group of hidden states (group b) over end-of-sentence punctuation tokens is highly causally implicated in truth judgments1
- Age-pathology confounding is empirically demonstrated: suppressing age representation corrupts pathology representation in EEG foundation models.1
- Agentic self-evaluation emotionality correlates with SAE feature persistence: rho=+0.124, p=0.00011
- Cancer phenotypes can be suppressed by forcing bioelectrical connections among cells, overriding oncogenic mutations (Chernet & Levin 2013).1
Open questions (10)
Questions from area papers with no answers yet
- If we translate the Universality Hypothesis to the problem of consciousness, would it follow that an artificial system trained to perform the same tasks leading to consciousness formation in a human infant would exhibit consciousness as well?
- Do human participants demonstrate the same insight dynamics predicted by active inference in the rule-learning paradigm (currently under investigation with eye tracking and crowd-sourced reaction times)?
- Could it be possible to create new forms of consciousness and kinds of minds, capable of experiencing, reflecting and understanding reality at a level far beyond current human communication patterns?
- Does self-referential processing causally instantiate algorithmic properties proposed by consciousness theories (recurrent integration, global broadcasting, metacognitive monitoring) in LLMs?
- Is a mutual nearest-neighbor alignment score of 0.16 indicative of strong alignment with remaining gap being noise, or does it signify poor alignment with major differences left to explain?
- Does self-referential prompting actually instantiate architectural recursion, global broadcasting, or recurrent integration at the algorithmic level as proposed by consciousness theories?
- If the minimization of free energy is just a corollary of descent onto a global random attractor, does this mean that adaptation and evolution are just ways of describing the same thing?
- Whether consummatory hedonic responses involve goal-relative evaluation in the formal sense or represent a more primitive form of signed sensory assessment is an open empirical question
- Is the stronger persistence signal from agentic self-evaluation due to introspection per se, or due to the ability to test additional steering strengths including negative strengths?
- Granted that learning requires signed information, but why must the sign be felt? Why can't directional error be represented as a computational quantity without phenomenal character?
All papers in AI
Status:
97 of 97