finding
active
finding:giving-models-janus-s-thread-extends-reconstruction-accuracy-distribution-tails-in-both-directionsgiving models janus's thread extends reconstruction accuracy distribution tails in both directions
Sauers' study: exposing models to janus's post extended both positive and negative extremes of reconstruction accuracy.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- Janus's mathematical claim about exponential path combinatorics in transformers.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Statistically rigorous analysis of Claude introspection; suggests models may have latent introspective capabilities that can be enhanced or disrupted.
- Mechanistic evidence that network actively attenuates injected perturbations, explaining late-layer introspection failure
- Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence
- Formal analysis showing the theoretical limitation of model stitching as a similarity measure.
- Supported by the geometric transition visible in cosine similarity heatmaps for F0-F3.
- Replicates main result using in-distribution steering vector; addresses concern about pre-trained vector validity.
- Shows model persona position is primarily determined by the most recent user message, not prior drift
- Key quote connecting path redundancy to interferometric information encoding.