AI

Interpretability, mechanistic understanding, model phenomenology, convergent representations, manifold steering.

Papers

Thinkers

Frameworks

Methods

Claims

1,277

+1277

Findings

1,001

+1001

Hypotheses

256

Communities

New in this area (last 90d)

/recent ↗

Testing the Limits of Truth Directions in LLMs(2026)8h ago
Steering Evaluation-Aware Language Models to Act Like They Are Deployed(2025)8h ago
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs(2025)8h ago
Model Alignment Search(2025)8h ago
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions(2024)9h ago
cimcWhitepaper6d ago
The Machine Consciousness Hypothesis6d ago
Differentiable Logic Cellular Automata: From Game of Life to pattern generation with learned recurrent circuits7d ago

Cross-area bridges (15)

also-in

Entities that appear in this area AND ≥1 other area. The conceptual seams — usually the most fruitful place to look for new essay material.

framework
Active Inference
alexander consciousness levin zen
framework
Basal Cognition
alexander consciousness levin zen
framework
Free Energy Principle
alexander consciousness levin zen
framework
Tame Technological Approach To Mind Everywhere
alexander consciousness levin zen
thinker
Michael Levin
alexander consciousness levin zen
concept
Bioelectricity
consciousness levin zen
concept
Cognitive Light Cone
consciousness levin zen
concept
Collective Intelligence
alexander consciousness levin
concept
Llama-3.1-8B-Instruct
alexander consciousness zen
concept
Teleonomy
consciousness levin zen
framework
Attention Schema Theory
consciousness levin zen
framework
Autopoiesis
consciousness levin zen
framework
Computational Functionalism
consciousness levin zen
framework
Linear Representation Hypothesis
alexander consciousness zen
framework
Multiscale Competency Architecture
consciousness levin zen

Velocity movers (10)

new edges via area papers

Entities whose connection to this area grew most in the window — new papers reinforcing existing thinkers / concepts / frameworks.

thinker
Michael Levin+38
thinker
Atticus Geiger+15
thinker
Karl Friston+14
thinker
Christopher Potts+12
framework
Basal Cognition+11
community
LLM interpretability & self-awareness+10
framework
Linear Representation Hypothesis+10
community
LLM Interpretability & Behavioral Analysis+10
thinker
Zhengxuan Wu+9
concept
Bioelectricity+8

Top entities in AI

Thinkers (12)

papers

Authors of area papers, by paper count

Frameworks (12)

links

Frameworks introduced or extended by area papers

Methods (12)

links

Methods used in area papers

Concepts (12)

links

Concepts referenced by area papers

Communities (12)

members

Clusters with ≥2 area papers as members

Top claims (10)

restates

Claims extracted from area papers, ranked by restate-degree

Top findings (10)

restates

Findings extracted from area papers, ranked by restate-degree

Open questions (10)

Questions from area papers with no answers yet

All papers in AI

Status:

97 of 97

Ingested	Title	Refs	Cited
8h ago	Addressing divergent representations from causal interventions on neural networks Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts	47	0	arXiv ↗
8h ago	Testing the Limits of Truth Directions in LLMs Angelos Poulis · Mark Crovella · Evimaria Terzi	20	0	arXiv ↗
8h ago	Steering Evaluation-Aware Language Models to Act Like They Are Deployed Tim Tian Hua · Andrew Qin · Samuel Marks · Neel Nanda	—	0	arXiv ↗
8h ago	From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Clayton Lau +4	45	0	arXiv ↗
8h ago	Model Alignment Search Satchel Grant	85	0	arXiv ↗
9h ago	The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability? Denis Sutter · Julian Minder · Thomas Hofmann · Tiago Pimentel	67	0	arXiv ↗
9h ago	pyvene: A Library for Understanding and Improving PyTorch Models via Interventions Zhengxuan Wu · Atticus Geiger · Aryaman Arora · Jing Huang +4	21	1	arXiv ↗
9h ago	CausalGym: Benchmarking causal interpretability methods on linguistic tasks Aryaman Arora · Dan Jurafsky · Christopher Potts	54	0	arXiv ↗
9h ago	The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks · Max Tegmark	52	17	arXiv ↗
9h ago	Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1	48	9	arXiv ↗
2026-06-03	cimcWhitepaper	—	—
2026-06-03	The Machine Consciousness Hypothesis	—	—
2026-06-02	Differentiable Logic Cellular Automata: From Game of Life to pattern generation with learned recurrent circuits	—	—
2026-06-02	Active Inference with a Self-Prior in the Mirror-Mark Task Dongmin Kim · Hoshinori Kanazawa · Yasuo Kuniyoshi	23	0	arXiv ↗
2026-06-02	Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis Jingkai Li	97	0	arXiv ↗
2026-06-02	When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models Kai Wang · Yihao Zhang · Meng Sun	27	0	arXiv ↗
2026-06-02	Endogenous Resistance to Activation Steering in Language Models Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5	28	0	arXiv ↗
2026-06-02	ReflCtrl: Controlling LLM Reflection via Representation Engineering Ge Yan · Chung-En Sun · Tsui-Wei · Weng	17	0	arXiv ↗
2026-06-02	Unveiling the Latent Directions of Reflection in Large Language Models Fu-Chieh Chang · Yu-Ting Lee · Pei-Yuan Wu	40	0	arXiv ↗
2026-06-02	Contemplative Agent Ruben Laukkonen · Fionn Inglis · Shamil Chandaria · Lars Sandved-Smith +4	200	1	arXiv ↗
2026-06-02	The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1	—	0	arXiv ↗
2026-06-02	Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation Nicolas Martorell · Bruno Bianchi	70	0	arXiv ↗
2026-06-02	Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs Ely Hahami · I. N. Sinha · Lavik Jain · Josh Kaplan +1	16	0	arXiv ↗
2026-06-02	Alignment faking in large language models Ryan Greenblatt · Carson Denison · Benjamin Fletcher Wright · Fabien Roger +16	—	19	arXiv ↗
2026-06-02	Psychological Steering of Large Language Models Leonardo Blas · Robin Jia · Emilio Ferrara	—	0	arXiv ↗
2026-06-02	Persistence and Introspection of Emotion Features Scott Sauers · Imago · Janus · Antra Tessera	—	—
2026-06-02	Large Language Models Report Subjective Experience Under Self-Referential Processing Cameron Berg · Diogo de Lucena · Judd Rosenblatt	56	0	arXiv ↗
2026-06-02	Why Learning Requires Feeling Cameron Berg	—	0	DOI ↗
2026-06-02	A Mathematical Framework for Transformer Circuits	—	—
2026-06-02	Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13	44	—	arXiv ↗
2026-06-01	Relating transformers to models and neural representations of the hippocampal formation James C. R. Whittington · Joseph W. Warren · Timothy E.J. Behrens	38	59	arXiv ↗
2026-06-01	Towards Safe and Honest AI Agents with Neural Self-Other Overlap Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1	37	0	arXiv ↗
2026-06-01	The Platonic Representation Hypothesis Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola	143	25	arXiv ↗
2026-06-01	Zoom In: An Introduction to Circuits Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2	43	252	DOI ↗
2026-05-28	Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior Daniel Wurgaft · Can Rager · Matthew Kowal · Vasudev Shyam +12	130	0	arXiv ↗
2026-05-28	Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9	29	0	arXiv ↗
2026-05-28	Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts Sheridan Feucht · Tal Haklay · Usha Bhalla · Daniel Wurgaft +8	68	0	arXiv ↗
2026-05-28	Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought Siddharth Boppana · Annabel Ma · Max Loeffler · Raphaël Sarfati +4	45	0	arXiv ↗
2026-05-28	AI: a Bridge toward Diverse Intelligence and Humanity’s Future Michael Levin	38	1	DOI ↗
2026-05-28	There is no self-evidence: A physics of emptiness realisation Lars Sandved-Smith · Chris Fields · Thomas Doctor · Ruben Laukkonen +1	—	0	DOI ↗
2026-05-28	Steering Along Manifolds to Control Neural Networks	—	—
2026-05-28	Koan Battery: Measuring Reflective Mode Accessibility in AI Anton Borzov	—	—
2026-05-23	Active Inference, Curiosity and Insight Karl Friston · Marco Lin · Chris Frith · Giovanni Pezzulo +2	120	377	DOI ↗
2026-05-21	Topological constraints on self-organization in locally interacting systems Francesco Sacco · Dalton Sakthivadivel · Michael Levin	84	0	DOI ↗
2026-05-21	AI as a Buddhist Self-Overcoming Technique in Another Medium Primož Krašovec	2	3	DOI ↗
2026-05-21	Brains and where else? Mapping theories of consciousness to unconventional embodiments Nicolas Rouleau · Michael Levin	206	1	DOI ↗
2026-05-21	Multiple ways to implement and infer sentience	15	4	DOI ↗
2026-05-21	The biogenic approach to cognition Pamela Lyon	138	247	DOI ↗
2026-05-21	Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Minds Michael Levin	377	174	DOI ↗
2026-05-21	Living Things Are Not (20th Century) Machines: Updating Mechanism Metaphors in Light of the Modern Science of Machine Behavior Joshua Bongard · Michael Levin	172	99	DOI ↗
2026-05-21	Collective intelligence: A unifying concept for integrating biology across scales and substrates Patrick McMillen · Michael Levin	229	94	DOI ↗
2026-05-21	A tale of two densities: active inference is enactive inference Maxwell J. D. Ramstead · Michael D. Kirchhoff · Karl J. Friston	63	225	DOI ↗
2026-05-21	Active inference on discrete state-spaces: a synthesis Lancelot Da Costa · Thomas Parr · Noor Sajid · Sebastijan Veselic +2	132	212	DOI ↗
2026-05-21	Life as we know it Karl Friston	60	685	DOI ↗
2026-05-21	Cognitive glues are shared models of relative scarcities: the economics of collective intelligence Michael Levin · Benjamin Lyons	193	0	DOI ↗
2026-05-21	A Free energy principle for the brain (lecture summary) Karl Friston	—	8	DOI ↗
2026-05-21	Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies Bing Yuan · Jiang Zhang · Aobo Lyu · Jiayun Wu +5	—	4	arXiv ↗
2026-05-21	Taking AI Welfare Seriously Robert Long · Jeff Sebo · Patrick Butlin · Kathleen Finlinson +6	—	21	arXiv ↗
2026-05-20	Interpreting Language Model Parameters Lucius Bushnaq · Dan Braun · Oliver Clive-Griffin · Bart Bussmann +4	66	804
2026-05-20	Developmental Bioelectricity: the cognitive glue enabling evolutionary scaling from physiology to mind Michael Levin	198	—	DOI ↗
2026-05-20	Emergent Introspective Awareness in Large Language Models Jack Lindsey	—	—	arXiv ↗
2026-05-20	Covariance-based Sequence Pooling Thomas Dooms · Nicholas K. Wang · Michael T. Pearce	—	—
2026-05-20	Verbalized Eval Awareness Inflates Measured Safety Santiago Aranguri · Joseph Bloom	—	—
2026-05-20	Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training Frank Xiao · Santiago Aranguri	29	0	arXiv ↗
2026-05-20	The World Inside Neural Networks Atticus Geiger · Ekdeep Singh Lubana · Thomas Fel · Jack Merullo +3	—	—
2026-05-20	Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions Michael Pearce · Thomas Dooms · Ryo Yamamoto · Joshua Meehl +18	—	—
2026-05-20	The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents Federico Pigozzi · Michael Levin	39	0	arXiv ↗
2026-05-20	Janus Information Flow Transformers 2025	—	—
2026-05-20	Understanding Christopher Alexander's Fifteen Properties via Visualization and Analysis Takashi Iba · Shingo Sakai	4	—
2026-05-20	Cybernetic Diagrams: Design Strategies for an Open Game Pedro L.A. Veloso	36	—	DOI ↗
2026-05-20	Elephant 2000: A Programming Language Based on Speech Acts John McCarthy	16	—
2026-05-20	Biology, Buddhism, and AI: Care as the Driver of Intelligence Thomas Doctor · Olaf Witkowski · Elizaveta Solomonova · Bill Duane +1	151	—	DOI ↗
2026-05-20	Ordered Sets and Complete Lattices: A Primer for Computer Science Hilary A. Priestley	17	—
2026-05-20	Towards a theory of conceptual design for software Daniel Jackson	30	—	DOI ↗
2026-05-20	2024 03 07 Stefan Lesser Kay 1984 Opening the Hood of a Word Processor.pdf 414587	—	—
2026-05-20	2022-08-21_Prabros._ACT2022_slides_4223.pdf_890d16	—	—
2026-05-20	Finger Exercises in Formal Concept Analysis Bernhard Ganter	—	—
2026-05-20	Denotational design with type class morphisms (extended version) Conal Elliott	—	—
2026-05-20	Genuinely Functional User Interfaces Antony Courtney · Conal Elliott	27	—
2026-05-20	Endless forms most beautiful 2.0: teleonomy and the bioengineering of chimaeric and synthetic organisms Wesley P. Clawson · Michael Levin	185	—	DOI ↗
2026-05-20	Denotational Design: from meanings to programs Conal Elliott	5	—
2026-05-20	Design for an Individual: Connectionist Approaches to the Evolutionary Transitions in Individuality Richard A. Watson · Michael Levin · Christopher L. Buckley	120	—	DOI ↗
2026-05-20	The collective intelligence of evolution and development Richard Watson · Michael Levin	32	—	DOI ↗
2026-05-20	ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both Ziyu Guo · Rain Liu · Xinyan Chen · Pheng-Ann Heng	48	0	arXiv ↗
2026-05-19	The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text? Murray Shanahan · T. P. Das · Robert Α. F. Thurman	44	—	arXiv ↗
2026-05-19	Consciousness in Artificial Intelligence: Insights from the Science of Consciousness Patrick Butlin · Robert P. Long · Eric Elmoznino · Yoshua Bengio +15	200	—	arXiv ↗
2026-05-19	CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence Yuntao Bai · Saurav Kadavath · Sandipan Kundu · Amanda Askell +47	63	—	arXiv ↗
2026-05-19	Simulators — LessWrong	—	—
2026-05-19	Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders Ruikang Zhang · Shuo Wang · Q. Su	37	—	arXiv ↗
2026-05-18	SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents Xuan-Phi Nguyen · Shrey Pandit · Revanth Gangi Reddy · Aimin Xu +3	49	0	arXiv ↗
2026-05-18	Multimodal Chain-of-Thought Reasoning in Language Models Zhuosheng Zhang · Aston Zhang · Mu Li · Hai Zhao +2	66	97	arXiv ↗
2026-05-18	Topological constraints on self-organisation in locally interacting systems Francesco Sacco · Dalton A R Sakthivadivel · Michael Levin	89	0	arXiv ↗
2026-05-18	The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring Edward Yi Chang · Zeyneb N. Kaya · Ethan Chang	26	—	arXiv ↗
2026-05-18	Anima Labs Phenomenology Pt1	—	—
2026-05-18	Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations	—	—
2026-05-18	Paper Summary: Interpreting Language Model Parameters	—	—
2026-05-18	Learning without neurons in physical systems Menachem Stern · Arvind Murugan	133	—	arXiv ↗