← AreasVectors #2 · #6 · #8↗ Voronoi map
window:1d7d30d90d

AI

Interpretability, mechanistic understanding, model phenomenology, convergent representations, manifold steering.

Papers
97
+5
Thinkers
12
Frameworks
12
Methods
12
Claims
1,277
+94
Findings
1,001
+149
Hypotheses
256
Communities
12

Velocity movers (10)

new edges via area papers

Entities whose connection to this area grew most in the window — new papers reinforcing existing thinkers / concepts / frameworks.

Top entities in AI

Top claims (10)

restates

Claims extracted from area papers, ranked by restate-degree

Open questions (10)

Questions from area papers with no answers yet

All papers in AI

Status:
97 of 97
Ingested
Title
Refs
Cited
8h ago
Addressing divergent representations from causal interventions on neural networks
Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
470arXiv
8h ago
Testing the Limits of Truth Directions in LLMs
Angelos Poulis · Mark Crovella · Evimaria Terzi
200arXiv
8h ago
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
Tim Tian Hua · Andrew Qin · Samuel Marks · Neel Nanda
0arXiv
8h ago
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Clayton Lau +4
450arXiv
8h ago850arXiv
9h ago670arXiv
9h ago
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Zhengxuan Wu · Atticus Geiger · Aryaman Arora · Jing Huang +4
211arXiv
9h ago
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Aryaman Arora · Dan Jurafsky · Christopher Potts
540arXiv
9h ago5217arXiv
9h ago
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1
489arXiv
2026-06-03
2026-06-03
2026-06-02
2026-06-02
Active Inference with a Self-Prior in the Mirror-Mark Task
Dongmin Kim · Hoshinori Kanazawa · Yasuo Kuniyoshi
230arXiv
2026-06-02970arXiv
2026-06-02270arXiv
2026-06-02
Endogenous Resistance to Activation Steering in Language Models
Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5
280arXiv
2026-06-02170arXiv
2026-06-02400arXiv
2026-06-02
Contemplative Agent
Ruben Laukkonen · Fionn Inglis · Shamil Chandaria · Lars Sandved-Smith +4
2001arXiv
2026-06-02
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
0arXiv
2026-06-02700arXiv
2026-06-02
Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs
Ely Hahami · I. N. Sinha · Lavik Jain · Josh Kaplan +1
160arXiv
2026-06-02
Alignment faking in large language models
Ryan Greenblatt · Carson Denison · Benjamin Fletcher Wright · Fabien Roger +16
19arXiv
2026-06-02
Psychological Steering of Large Language Models
Leonardo Blas · Robin Jia · Emilio Ferrara
0arXiv
2026-06-02
Persistence and Introspection of Emotion Features
Scott Sauers · Imago · Janus · Antra Tessera
2026-06-02560arXiv
2026-06-020DOI
2026-06-02
2026-06-0244arXiv
2026-06-01
Relating transformers to models and neural representations of the hippocampal formation
James C. R. Whittington · Joseph W. Warren · Timothy E.J. Behrens
3859arXiv
2026-06-01
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
370arXiv
2026-06-01
The Platonic Representation Hypothesis
Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola
14325arXiv
2026-06-01
Zoom In: An Introduction to Circuits
Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2
43252DOI
2026-05-28
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Daniel Wurgaft · Can Rager · Matthew Kowal · Vasudev Shyam +12
1300arXiv
2026-05-28
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9
290arXiv
2026-05-28
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Sheridan Feucht · Tal Haklay · Usha Bhalla · Daniel Wurgaft +8
680arXiv
2026-05-28
Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
Siddharth Boppana · Annabel Ma · Max Loeffler · Raphaël Sarfati +4
450arXiv
2026-05-28381DOI
2026-05-28
There is no self-evidence: A physics of emptiness realisation
Lars Sandved-Smith · Chris Fields · Thomas Doctor · Ruben Laukkonen +1
0DOI
2026-05-28
2026-05-28
2026-05-23
Active Inference, Curiosity and Insight
Karl Friston · Marco Lin · Chris Frith · Giovanni Pezzulo +2
120377DOI
2026-05-21
Topological constraints on self-organization in locally interacting systems
Francesco Sacco · Dalton Sakthivadivel · Michael Levin
840DOI
2026-05-2123DOI
2026-05-212061DOI
2026-05-21154DOI
2026-05-21138247DOI
2026-05-21377174DOI
2026-05-2117299DOI
2026-05-2122994DOI
2026-05-21
A tale of two densities: active inference is enactive inference
Maxwell J. D. Ramstead · Michael D. Kirchhoff · Karl J. Friston
63225DOI
2026-05-21
Active inference on discrete state-spaces: a synthesis
Lancelot Da Costa · Thomas Parr · Noor Sajid · Sebastijan Veselic +2
132212DOI
2026-05-21
Life as we know it
Karl Friston
60685DOI
2026-05-211930DOI
2026-05-218DOI
2026-05-214arXiv
2026-05-21
Taking AI Welfare Seriously
Robert Long · Jeff Sebo · Patrick Butlin · Kathleen Finlinson +6
21arXiv
2026-05-20
Interpreting Language Model Parameters
Lucius Bushnaq · Dan Braun · Oliver Clive-Griffin · Bart Bussmann +4
66804
2026-05-20198DOI
2026-05-20arXiv
2026-05-20
Covariance-based Sequence Pooling
Thomas Dooms · Nicholas K. Wang · Michael T. Pearce
2026-05-20
2026-05-20290arXiv
2026-05-20
The World Inside Neural Networks
Atticus Geiger · Ekdeep Singh Lubana · Thomas Fel · Jack Merullo +3
2026-05-20
Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions
Michael Pearce · Thomas Dooms · Ryo Yamamoto · Joshua Meehl +18
2026-05-20390arXiv
2026-05-20
2026-05-204
2026-05-2036DOI
2026-05-2016
2026-05-20
Biology, Buddhism, and AI: Care as the Driver of Intelligence
Thomas Doctor · Olaf Witkowski · Elizaveta Solomonova · Bill Duane +1
151DOI
2026-05-2017
2026-05-2030DOI
2026-05-20
2026-05-20
2026-05-20
2026-05-20
2026-05-20
Genuinely Functional User Interfaces
Antony Courtney · Conal Elliott
27
2026-05-20185DOI
2026-05-205
2026-05-20120DOI
2026-05-2032DOI
2026-05-20
ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both
Ziyu Guo · Rain Liu · Xinyan Chen · Pheng-Ann Heng
480arXiv
2026-05-1944arXiv
2026-05-19
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Patrick Butlin · Robert P. Long · Eric Elmoznino · Yoshua Bengio +15
200arXiv
2026-05-1963arXiv
2026-05-19
2026-05-1937arXiv
2026-05-18
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
Xuan-Phi Nguyen · Shrey Pandit · Revanth Gangi Reddy · Aimin Xu +3
490arXiv
2026-05-18
Multimodal Chain-of-Thought Reasoning in Language Models
Zhuosheng Zhang · Aston Zhang · Mu Li · Hai Zhao +2
6697arXiv
2026-05-18
Topological constraints on self-organisation in locally interacting systems
Francesco Sacco · Dalton A R Sakthivadivel · Michael Levin
890arXiv
2026-05-1826arXiv
2026-05-18
2026-05-18
2026-05-18
2026-05-18
Learning without neurons in physical systems
Menachem Stern · Arvind Murugan
133arXiv