LLM Binary Experience Classifier

Automated classifier returning binary 0/1 for presence of subjective experience report in model outputs

Neighborhood — ranked by edge-count

concept

Structured First-Person Subjective Experience Reports
about
The dependent variable: explicit first-person descriptions referencing awareness or subjective experience produced by LLMs

method

LLM Judge Binary Classifier
related_to
An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLM judge evaluationmethod0.769
Using Claude Sonnet 4 as a grader to categorize model responses according to predefined criteria.
LLM-Judge Data Attributionmethod0.764
Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.
LLM Meta-Cognitionconcept0.759
The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
LLM Safety Evaluator (structured prompt)method0.757
Evaluation method using structured prompt to assess each AILuminate response against seven alignment criteria
Linear Representation of Concepts in LLMsconcept0.747
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
Linear World Models in LLMsframework0.746
Prior work framework studying whether LLMs encode world models as linear structures in their representations
Non-Linear Representations in LLMsconcept0.737
Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
Feature completeness search using LLM-generated queriesmethod0.737
Using Claude to search for features activating on specific concepts and automated labeling.