paper:doi-10-1145-1294211-1294256Evaluating user interface systems research
Original abstract (expand)
The development of user interface systems has languished with the stability of desktop computing. Future systems, however, that are off-the-desktop, nomadic or physical in nature will involve new devices and new software systems for creating interactive applications. Simple usability testing is not adequate for evaluating complex systems. The problems with evaluating systems work are explored and a set of criteria for evaluating new UI systems work is presented.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- ≈ 72%
- Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future DirectionsJay Nanavati, Khaldoun Zine El Abidine, Lenon Mendes and Christina Mack Mourad Gridach2025≈ 66%
- Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in HealthcareHarshit Rajgarhia, Shivali Dalmia, Ananya Mantravadi Prasanna Desikan2026≈ 66%
- URSA: The Universal Research and Scientific AgentNathan Debardeleben, Russell Bent, Rahul Somasundaram, Isaac Michaud, Arthur Lui, Alexius Wadell, Warren D. Graham, Golo A Wimmer, Sachin Shivakumar, Joan Vendrell Gallart, Harsha Nagarajan, Earl Lawrence Michael Grosskopf2026≈ 65%
- Sentiment Analysis for Education with R: packages, methods and practical applicationsAlessia Forciniti, Germana Scepi, Maria Spano Michelangelo Misuraca2026≈ 64%
- SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded TasksKaiyan Zhang, Tiansheng Hu, Sihong Wu, Ronan Le Bras, Charles McGrady, Taira Anderson, Jonathan Bragg, Joseph Chee Chang, Jesse Dodge, Matt Latzke, Yixin Liu, Xiangru Tang, Zihang Wang, Chen Zhao, Hannaneh Hajishirzi, Doug Downey, Arman Cohan Yilun Zhao2026≈ 64%
- Technical Dimensions of Programming Systemsin corpus2023≈ 64%
- Recommender systems, stigmergy, and the tyranny of popularityPaul E. Smaldino Zackary Okun Dunivin2025≈ 63%
- Estimating the Value of Evidence-Based Decision MakingAnish Agarwal, Guido Imbens, Siwei Jia, James McQueen, Serguei Stepaniants and Santiago Torres Alberto Abadie2026≈ 63%
- AI, Meet Human: Learning Paradigms for Hybrid Decision Making SystemsRoberto Pellungrini, Mattia Setzu, Fosca Giannotti and Dino Pedreschi Clara Punzi2026≈ 63%
- AI, Humans, and Data Science: Optimizing Roles Across Workflows and the WorkforceRichard Timpone and Yongwei Yang2025≈ 63%
- Evaluating Language Model Agency through NegotiationsVeniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West Tim R. Davidson2026≈ 63%
- A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE SeminarHenry Muccini, Karthik Vaidhyanathan, Marcos Kalinowski, Michele Albano, Antonio Pedro Santos Alves, Renato Cerqueira, Mateus Devino, Matteo Esposito, Rodrigo Falc\~ao, Vinicius Henning, Foutse Khomh, Valentina Lenarduzzi, Qinghua Lu, Mat\'ias Mart\'inez, Henrique Mello, Daniel Mendez, Lucas Romao Davide Taibi2026≈ 63%
- Reproducible, Explainable, and Effective Evaluations of Agentic AI for Software EngineeringAndr\'e Storhaug Jingyue Li2026≈ 62%
- Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development GoalsBipasha Banerjee, Edward A. Fox William A. Ingram2025≈ 62%
- ≈ 62%
- Towards a theory of conceptual design for softwarein corpus2015≈ 60%
- ≈ 58%
- ≈ 58%
- ≈ 58%
- Model Alignment Searchin corpus2025≈ 58%
- Opening the Hood of a Word Processorin corpus1984≈ 57%
- ≈ 57%
- ≈ 57%
- Verbalized Eval Awareness Inflates Measured Safetyin corpus2026≈ 56%
- ≈ 56%
- ≈ 56%
Similar preprints — Semantic Scholar
Cited by (1)
- Technical Dimensions of Programming Systems
Programming systems research has lacked a common analytic vocabulary comparable to what exists for programming languages, leaving systems like Smalltalk, UNIX, HyperCard, and Jupyter evaluable only th