finding
active
finding:mistral-7b-on-false-belief-iit-4-0-is-the-sole-case-exhibiting-statistically-significant-differences-between-score-categories-under-temporal-permutation-at-the-task-levelMistral-7B on False Belief (IIT 4.0) is the sole case exhibiting statistically significant Φ differences between score categories under temporal permutation at the task level.
Only Criterion 2 is satisfied for this single case at the task level (granularity without aggregation).
Source paper
extracted_from(2025) · Li, Jingkai
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- Specific prediction linking IIT's prediction of high Φ for good performance to the experimental design's scoring structure.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Contrasts with temporal permutation where Span Representation dominates; suggests spatio permutation reveals different dynamics.
- Third promising case from temporal permutation analysis.
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.752Central interpretive claim of the paper supported by causal ablation and activation evidence
- One of the most promising cases; approximately corresponds to the 2/3 layer of LLaMA3.1-8B.
- Patching experiments localize truth representations to these specific hidden states in LLaMA-2 models
- Contrasts with temporal permutation results; constitutes the most suggestive evidence of potential consciousness phenomena in LLM representations.
- Mistral-7B-Instruct-v0.2 deceptive response rate reduced from 73.6% to 17.27% ± 1.88% after SOO fine-tuningfinding0.743Primary result showing SOO fine-tuning significantly reduces deception in Mistral-7B
- Even the rare cases where good > bad do not reach the 80% significance threshold required by Criterion 1.