finding

active

finding:mistral-7b-on-false-belief-iit-4-0-is-the-sole-case-exhibiting-statistically-significant-differences-between-score-categories-under-temporal-permutation-at-the-task-level

Mistral-7B on False Belief (IIT 4.0) is the sole case exhibiting statistically significant Φ differences between score categories under temporal permutation at the task level.

Only Criterion 2 is satisfied for this single case at the task level (granularity without aggregation).

Source paper

extracted_from

Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis

(2025) · Li, Jingkai

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

If 'consciousness' phenomenon can be observed from ToM-related RN, higher ToM test scores should yield higher values of μΦmax (IIT 3.0) and/or μΦ (IIT 4.0).
associated_with
Specific prediction linking IIT's prediction of high Φ for good performance to the experimental design's scoring structure.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Under spatio permutation controls, IIT consciousness estimates outperform Span Representation in mean AUC in several cases (LLaMA3.1-70B on Hinting and Irony, Mistral-7B on Irony, LLaMA3.1-8B on Strange Stories).finding0.760
Contrasts with temporal permutation where Span Representation dominates; suggests spatio permutation reveals different dynamics.
Layer 29 (indexed at 10) of LLaMA3.1-8B on Strange Stories (2 scores) satisfies Criteria 1 and 2 under IIT 4.0 (temporal permutation).finding0.755
Third promising case from temporal permutation analysis.
Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.752
Central interpretive claim of the paper supported by causal ablation and activation evidence
Layer 24 (indexed at 8) of LLaMA3.1-8B on Hinting satisfies Criteria 1 and 2 under both IIT 3.0 and IIT 4.0 (temporal permutation).finding0.751
One of the most promising cases; approximately corresponds to the 2/3 layer of LLaMA3.1-8B.
A small group of hidden states (group b) over end-of-sentence punctuation tokens is highly causally implicated in truth judgmentsfinding0.747
Patching experiments localize truth representations to these specific hidden states in LLaMA-2 models
Under spatio permutation controls, two cases (Layer 32 of Mixtral-8x7B on Strange Stories, IIT 4.0, Linguistic Spans: Entire and Complement) satisfy all three criteria.finding0.746
Contrasts with temporal permutation results; constitutes the most suggestive evidence of potential consciousness phenomena in LLM representations.
Mistral-7B-Instruct-v0.2 deceptive response rate reduced from 73.6% to 17.27% ± 1.88% after SOO fine-tuningfinding0.743
Primary result showing SOO fine-tuning significantly reduces deception in Mistral-7B
None of the cases identified under temporal permutation satisfy the Criterion 1 threshold of >80% 'good' cases for any ToM task.finding0.743
Even the rare cases where good > bad do not reach the 80% significance threshold required by Criterion 1.