paper:arxiv-2310-01405Representation engineering: A top-down approach to AI transparency
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- Towards AI Transparency and Accountability: A Global Framework for Exchanging Information on AI SystemsAdrian Byrne, Nicholas Perello, Cyrus Cousins, Taha Yasseri, Yair Zick, Przemyslaw Grabowicz Warren Buckley2026≈ 81%
- Engineering.ai: A Platform for Teams of AI Engineers in Computational DesignYupeng Qi, Jingsen Feng, Xu Chu Ran Xu2025≈ 79%
- Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language ModelsXuntao Lyu, Meng Liu, Hongyi Wang, Ang Li Bowei Tian2025≈ 79%
- Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language ModelsSahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz Jan Wehner2025≈ 78%
- TRACE: A Metrologically-Grounded Engineering Framework for Trustworthy Agentic AI Systems in Operationally Critical DomainsSerhii Zabolotnii2026≈ 77%
- ≈ 77%
- ≈ 76%
- ≈ 76%
- Agentic AI in Engineering and Manufacturing: Industry Perspectives on Utility, Adoption, Challenges, and OpportunitiesMaxwell Bauer, Claire Jacquillat, A. John Hart, Faez Ahmed Kristen M. Edwards2026≈ 76%
- Socio-technical aspects of Agentic AIAlaa Saleh, Ying Li, Shubham Vaishnav, Kai Fang, Hailin Feng, Yuchao Xia, Thippa Reddy Gadekallu, Qiyang Zhang, Xiaodan Shi, Ali Beikmohammadi, Sindri Magn\'usson, Ilir Murturi, Chinmaya Kumar Dehury, Marcin Paprzycki, Lauri Loven, Sasu Tarkoma, Schahram Dustdar Praveen Kumar Donta2026≈ 76%
- A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI WorkflowsRoss Gore, Peter Foytik, Sachin Shetty, Ravi Mukkamala, Abdul Rahman, Xueping Liang, Safdar H. Bouk, Amin Hass, Sachini Rajapakse, Ng Wee Keong, Kasun De Zoysa, Aruna Withanage, Nilaan Loganathan Eranga Bandara2025≈ 76%
- Artificial Intelligence for Collective Intelligence: A National-Scale Research StrategyNirav Ajmeri (1), Mike Batty (2), Michaela Black (3), John Cartlidge (1), Robert Challen (1), Cangxiong Chen (4), Jing Chen (5), Joan Condell (3), Leon Danon (1), Adam Dennett (2), Alison Heppenstall (6), Paul Marshall (1), Phil Morgan (5), Aisling O'Kane (1), Laura G. E. Smith (4), Theresa Smith (4), Hywel T. P. Williams (7) ((1) University of Bristol, (2) University College London, (3) Ulster University, (4) University of Bath, (5) Cardiff University, (6) University of Glasgow, (7) University of Exeter) Seth Bullock (1)2024≈ 76%
- From Junior to Senior: Allocating Agency and Navigating Professional Growth in Agentic AI-Mediated Software EngineeringBhada Yun, April Yi Wang Dana Feng2026≈ 75%
- Describing Agentic AI Systems with C4: Lessons from Industry ProjectsStefan Wittek Andreas Rausch2026≈ 75%
- Automotive Engineering-Centric Agentic AI Workflow FrameworkZhihao Liu, Piero Brigida, Yerlan Akhmetov, Gurudevan Devarajan, Kai Liu, Ajinkya Bhave Tong Duy Son2026≈ 75%
- ≈ 71%
- Taking AI Welfare Seriouslyin corpus2024≈ 70%
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?in corpus2025≈ 70%
- Cognitive glues are shared models of relative scarcities: the economics of collective intelligencein corpus2026≈ 70%
- ≈ 70%
- ≈ 70%
- Interpreting Language Model Parametersin corpus2026≈ 70%
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representationsin corpus2023≈ 70%
- Towards a theory of conceptual design for softwarein corpus2015≈ 69%
- ≈ 69%
- ≈ 69%
- ≈ 69%
- Collective intelligence: A unifying concept for integrating biology across scales and substratesin corpus2024≈ 69%
Similar preprints — Semantic Scholar
Cited by (5)
- Endogenous Resistance to Activation Steering in Language Models
- ReflCtrl: Controlling LLM Reflection via Representation Engineering
ReflCtrl demonstrates that self-reflection in reasoning LLMs is governed by an identifiable direction in latent representation space and that suppressing this direction via stepwise steering can reduc
- Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
Quantitative introspection—the causal coupling between an instruction-tuned LLM's numeric self-report and a probe-defined internal emotive direction—is demonstrably present in models as small as LLaMA
- Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Self-Other Overlap (SOO) fine-tuning, a method that minimizes the Mean Squared Error between a model's internal activations when processing self-referencing versus other-referencing inputs, reduces de
- Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training
Probe-based data attribution, introduced here as a method for surfacing and mitigating undesirable post-training behaviors, reduces harmful compliance in OLMo 2 7B by 63% through datapoint filtering a