Evaluating user interface systems research

ByDan R. Olsen

Original abstract (expand)

The development of user interface systems has languished with the stability of desktop computing. Future systems, however, that are off-the-desktop, nomadic or physical in nature will involve new devices and new software systems for creating interactive applications. Simple usability testing is not adequate for evaluating complex systems. The problems with evaluating systems work are explored and a set of criteria for evaluating new UI systems work is presented.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

User Simulation for Evaluating Information Access Systems
Krisztian Balog and ChengXiang Zhai
2026
≈ 72%
Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions
Jay Nanavati, Khaldoun Zine El Abidine, Lenon Mendes and Christina Mack Mourad Gridach
2025
≈ 66%
Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in Healthcare
Harshit Rajgarhia, Shivali Dalmia, Ananya Mantravadi Prasanna Desikan
2026
≈ 66%
URSA: The Universal Research and Scientific Agent
Nathan Debardeleben, Russell Bent, Rahul Somasundaram, Isaac Michaud, Arthur Lui, Alexius Wadell, Warren D. Graham, Golo A Wimmer, Sachin Shivakumar, Joan Vendrell Gallart, Harsha Nagarajan, Earl Lawrence Michael Grosskopf
2026
≈ 65%
Sentiment Analysis for Education with R: packages, methods and practical applications
Alessia Forciniti, Germana Scepi, Maria Spano Michelangelo Misuraca
2026
≈ 64%
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
Kaiyan Zhang, Tiansheng Hu, Sihong Wu, Ronan Le Bras, Charles McGrady, Taira Anderson, Jonathan Bragg, Joseph Chee Chang, Jesse Dodge, Matt Latzke, Yixin Liu, Xiangru Tang, Zihang Wang, Chen Zhao, Hannaneh Hajishirzi, Doug Downey, Arman Cohan Yilun Zhao
2026
≈ 64%
Technical Dimensions of Programming Systems
in corpus
2023
≈ 64%
Recommender systems, stigmergy, and the tyranny of popularity
Paul E. Smaldino Zackary Okun Dunivin
2025
≈ 63%
Estimating the Value of Evidence-Based Decision Making
Anish Agarwal, Guido Imbens, Siwei Jia, James McQueen, Serguei Stepaniants and Santiago Torres Alberto Abadie
2026
≈ 63%
AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems
Roberto Pellungrini, Mattia Setzu, Fosca Giannotti and Dino Pedreschi Clara Punzi
2026
≈ 63%
AI, Humans, and Data Science: Optimizing Roles Across Workflows and the Workforce
Richard Timpone and Yongwei Yang
2025
≈ 63%
Evaluating Language Model Agency through Negotiations
Veniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West Tim R. Davidson
2026
≈ 63%
A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar
Henry Muccini, Karthik Vaidhyanathan, Marcos Kalinowski, Michele Albano, Antonio Pedro Santos Alves, Renato Cerqueira, Mateus Devino, Matteo Esposito, Rodrigo Falc\~ao, Vinicius Henning, Foutse Khomh, Valentina Lenarduzzi, Qinghua Lu, Mat\'ias Mart\'inez, Henrique Mello, Daniel Mendez, Lucas Romao Davide Taibi
2026
≈ 63%
Reproducible, Explainable, and Effective Evaluations of Agentic AI for Software Engineering
Andr\'e Storhaug Jingyue Li
2026
≈ 62%
Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals
Bipasha Banerjee, Edward A. Fox William A. Ingram
2025
≈ 62%
From Tool to Teacher: Rethinking Search Systems as Instructive Interfaces
David Elsweiler
2026
≈ 62%
Towards a theory of conceptual design for software
in corpus
2015
≈ 60%
Toward an ethics of autopoietic technology: Dukkha, care, and Intelligence
in corpus
2023
≈ 58%
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
in corpus
2026
≈ 58%
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
in corpus
2024
≈ 58%
Model Alignment Search
in corpus
2025
≈ 58%
Opening the Hood of a Word Processor
in corpus
1984
≈ 57%
AI: a Bridge toward Diverse Intelligence and Humanity’s Future
in corpus
2024
≈ 57%
2024 03 07 Stefan Lesser Kay 1984 Opening the Hood of a Word Processor.pdf 414587
in corpus
≈ 57%
Verbalized Eval Awareness Inflates Measured Safety
in corpus
2026
≈ 56%
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
in corpus
2024
≈ 56%
Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Minds
in corpus
2022
≈ 56%

Similar preprints — Semantic Scholar

Cited by (1)

Technical Dimensions of Programming Systems
Programming systems research has lacked a common analytic vocabulary comparable to what exists for programming languages, leaving systems like Smalltalk, UNIX, HyperCard, and Jupyter evaluable only th