paper
active
2023
paper:2022-04-30-stefan-lesser-prog22-master-pdf-978acd

Technical Dimensions of Programming Systems

ByJoel Jakubovic·Jonathan Edwards·Tomas Petricek

TL;DR

Programming systems research has lacked a common analytic vocabulary comparable to what exists for programming languages, leaving systems like Smalltalk, UNIX, HyperCard, and Jupyter evaluable only through personal impression rather than structured comparison. Jakubovic, Edwards, and Petricek address this directly by introducing the Technical Dimensions framework: a catalogue of qualitative design axes organized into 7 clusters—interaction, notation, conceptual structure, customizability, complexity, errors, and adoptability—each characterized by two extreme positions rather than scalar scores. The framework is derived from qualitative analysis of landmark systems spanning three reference classes (language-based ecosystems, OS-likes, and application-focused systems) and demonstrated in two concrete applications: a dimensional analysis of the Dark programming system (showing how its single integrated mode collapsing development, debugging, and cloud deployment constitutes a specific position on the feedback-loops and modes-of-interaction dimensions) and a design-space exploration plotting 10 systems on self-sustainability versus notational diversity axes using a binary yes/no scoring method (detailed in Appendix A), which reveals a conspicuous blank region combining high values on both dimensions—a gap occupied by no existing system including COLAs, Boxer, or the Web. The paper argues this implies that the gap is not structurally forbidden but an unrealized opportunity, and that the Technical Dimensions framework as a whole enables a Kuhnian 'normal science' for programming systems: filling in the design-space map, enabling researchers to stand on prior work rather than repeatedly rediscovering it in isolation.

What to take away

  1. 1. The Technical Dimensions framework organizes programming system characteristics into 7 clusters (interaction, notation, conceptual structure, customizability, complexity, errors, adoptability), each defined by two qualitative extremes rather than quantitative scores.
  2. 2. At the LIVE 2020 and LIVE 2021 workshops, 5/6 and 6/7 papers respectively presented new systems rather than analyzing existing ones, empirically demonstrating the field's inability to build cumulative knowledge.
  3. 3. Plotting 10 systems—including Haskell, Jupyter, Boxer, HyperCard, UNIX, Smalltalk, Lisp, spreadsheets, COLAs, and the Web—on self-sustainability versus notational diversity reveals a conspicuous empty region at high values of both dimensions, representing an unrealized design opportunity.
  4. 4. The scoring method introduced in Appendix A converts each qualitative dimension into a small set of binary yes/no questions and sums the affirmative answers to produce coordinates; for self-sustainability the questions include whether programs can generate and execute programs, whether changes persist indefinitely, and whether low-level infrastructure can be reprogrammed from within the running system.
  5. 5. Dark's primary technical innovation, from a Technical Dimensions perspective, is collapsing development, debugging, and cloud operation into a single integrated mode of interaction, which the framework maps to the modes-of-interaction and feedback-loops dimensions and traces genealogically to Smalltalk's image-based environment.
  6. 6. The self-sustainability dimension distinguishes systems by whether user-level programming can progressively replace implementation-level components without stepping outside the system, with COLAs scoring highest (5/5) and Haskell scoring lowest (0/5) among the plotted systems.
  7. 7. CSS is identified as a concrete substrate instantiating the additive authoring property: its selector-based addressing mechanism allows arbitrary behavioral override by addition rather than modification, without requiring destructive access to existing declarations.
  8. 8. An open hypothesis the paper raises is whether Deep Learning represents a qualitatively new level of automation or merely the latest instance of a recurring pattern in which 'automatic programming' is always a euphemism for programming in a higher-level language than previously available.
  9. 9. A researcher replicating the design-space exploration method should generate binary yes/no questions for each dimension by anchoring them to a small set of example systems whose intuitive placement is already agreed upon, stop adding questions when the important distinctions between anchor points are captured, and treat disagreements among raters as signals to revise question formulation rather than answer coding.
  10. 10. The framework explicitly absorbs and repositions several prior concepts—Cognitive Dimensions of Notation (Green & Petre 1996), levels of liveness (Tanimoto 2013), and pluralism/communicativity (Kell 2017)—as special cases or sub-dimensions, arguing that notational analysis alone (Cognitive Dimensions) leaves the majority of a system's design space uncharacterized.

Peer brief — for seminar discussion

Jakubovic, Edwards, and Petricek identify the core problem as follows: while programming language research has decades of shared vocabulary, formal semantics, and comparative methods, the broader class of programming systems—including Smalltalk, UNIX, Jupyter notebooks, HyperCard, spreadsheets, and Dark—can only be evaluated impressionistically, making it impossible to situate new work against prior work or to identify what genuinely advances the state of the art. The response is the Technical Dimensions framework, a catalogue of qualitative design axes grouped into 7 clusters (interaction, notation, conceptual structure, customizability, complexity, errors, adoptability), each bounded by two characteristic extremes. Dimensions are derived through qualitative analysis of roughly a dozen landmark systems spanning language-based ecosystems, OS-like systems, and application-focused systems; the method is explicitly aligned with the 'evaluating programming systems' stance of Edwards et al. (PPIG 2019) and with Chang's complementary science as an alternative to pure empirical evaluation. The load-bearing finding is twofold. First, the dimensions provide sufficient resolution to perform a structured analysis of Dark—identifying its collapsing of development, debugging, and cloud operation into a single mode as the primary design move, and naming its use of live request data to drive handler construction as 'Error-Driven Development' with a traceable precedent in the PILOT system for Lisp (Teitelman 1966). Second, plotting 10 systems on self-sustainability versus notational diversity using a binary-question scoring method (Appendix A) reveals a structurally empty region at high values on both axes—a gap occupied by neither COLAs (high sustainability, low notational diversity, scoring 5 and 1 respectively) nor Boxer or the Web (high notational diversity, low sustainability). The paper argues this gap is not architecturally forbidden and constitutes an actionable design target, one the first author's forthcoming dissertation aims to occupy. This implies that the Technical Dimensions framework can function as what the paper calls a Kuhnian 'normal science' instrument: systematically filling in a design-space map so that future system builders can identify unexplored positions rather than repeatedly rediscovering the same motivating examples (Smalltalk, Bret Victor, spreadsheets). An alternative methodological approach the paper could have taken is empirical user studies or controlled experiments, but it explicitly declines this in favor of qualitative holistic analysis, citing Olsen's UIST evaluation heuristics as an analogous precedent in HCI. A critical reader would push back on the scoring method in Appendix A. The binary yes/no questions used to generate coordinates are generated informally by the three authors to roughly match their own prior intuitions about where certain systems sit; the authors acknowledge this explicitly ('we were trying to make those intuitions more precise'). This means the resulting scatterplot is not an independent confirmation of the framework's validity but a visualization of the authors' prior beliefs made more explicit. The empty region in the design space—the paper's most concrete empirical claim—is therefore an artifact of the chosen question set, the chosen systems, and the chosen axes, not a theory-neutral discovery. A skeptical reader could also contest whether 'self-sustainability' and 'notational diversity' are genuinely independent dimensions or whether the apparent gap reflects a functional constraint (highly self-sustainable systems tend toward uniform internal representations, which conflicts with notational diversity) that the qualitative framework lacks the precision to capture. The paper itself notes this possibility but dismisses it on intuitive grounds without formal argument.

Methods (1)

Frameworks (1)

Claims (6)

Questions (3)

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

+27 more

Similar preprints — Semantic Scholar

Cross-corpus bridges (4)

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

  • alexander
    The Art, Science, and Engineering of Programmingpapers/extracted/2022-04-30_Stefan-Lesser_prog22-master.pdf_978acd.md0.850
  • aboutblank_kb
    Frameworks Comparisonsynthesis/frameworks-comparison.md0.806
  • alexander
    Frameworks Comparisonapplied/from-research-stack/frameworks-comparison.md0.798
  • alexander
    Towards a Theory of Conceptual Design for Softwarepapers/extracted/2023-03-09_Stefan-Lesser_concept-essay.pdf_161cb7.md0.786