finding

active

finding:haiku-model-forms-representations-of-the-end-of-a-rhyming-line-at-the-start-of-the-line

Haiku model forms representations of the end of a rhyming line at the start of the line

Mechanistic interpretability finding showing forward planning within a single forward pass; evidence for internally-directed causal influence.

Source paper

extracted_from

Anima Labs Phenomenology Pt1

Neighborhood — ranked by edge-count

Claims (1)

claim

The objection that feedforward networks cannot introspect is a cultural myth; autoregression provides recurrence across tokens.
supports
Antra's rebuttal to a common criticism; backed by Janus' information flow diagram.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Haiku phase space studymethod0.729
Anthropic's study of representations inside a single forward pass when writing rhyming text, revealing planning of line endings.
All models exhibit above-baseline representation of the think word when instructed to think about itfinding0.718
In the intentional control experiment, all tested models show above-zero cosine similarity to the think word's concept vector.
Language models are few-shot learners (Brown et al., 2020)concept0.713
Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
In Opus 4.1, representation of the think word decays to baseline by the final layer, unlike Claude 3 models where it persistsfinding0.708
Suggests that later models can keep the thought 'silent' rather than letting it influence output.
Aliveness and competence come apart; Haiku outranks Opus in forced-choice aesthetic comparisons despite lower baseline.finding0.707
Alexander mirror method reveals smaller models produce rougher, more alive responses; competence (rubric) ≠ aliveness (aesthetic).
Rudimentary language models are challenged by long sequences of outputs.finding0.706
Empirical observation explained by topological constraints: flat autoregressive architectures lack multiscale structure needed for long-range order.
Haiku-Kimi per-koan correlation rho=0.123 (p=0.52); H5a trace distillation not supported at individual model levelfinding0.705
Group correlation (rho=0.634) dissolves at individual level; shared posture not shared voice
Models might produce first-person experiential language by drawing on human-authored self-descriptions in pretraining data without internally encoding these acts as roleplayhypothesis0.703
Alternative hypothesis for how experience reports arise without explicit performance