finding
active
finding:haiku-model-forms-representations-of-the-end-of-a-rhyming-line-at-the-start-of-the-lineHaiku model forms representations of the end of a rhyming line at the start of the line
Mechanistic interpretability finding showing forward planning within a single forward pass; evidence for internally-directed causal influence.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- Antra's rebuttal to a common criticism; backed by Janus' information flow diagram.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Anthropic's study of representations inside a single forward pass when writing rhyming text, revealing planning of line endings.
- All models exhibit above-baseline representation of the think word when instructed to think about itfinding0.718In the intentional control experiment, all tested models show above-zero cosine similarity to the think word's concept vector.
- Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
- Suggests that later models can keep the thought 'silent' rather than letting it influence output.
- Alexander mirror method reveals smaller models produce rougher, more alive responses; competence (rubric) ≠ aliveness (aesthetic).
- Empirical observation explained by topological constraints: flat autoregressive architectures lack multiscale structure needed for long-range order.
- Group correlation (rho=0.634) dissolves at individual level; shared posture not shared voice
- Alternative hypothesis for how experience reports arise without explicit performance