method
active
method:llm-binary-experience-classifierLLM Binary Experience Classifier
Automated classifier returning binary 0/1 for presence of subjective experience report in model outputs
Neighborhood — ranked by edge-count
Concepts (1)
concept
- The dependent variable: explicit first-person descriptions referencing awareness or subjective experience produced by LLMs
Methods (1)
method
- LLM Judge Binary Classifierrelated_toAn LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Using Claude Sonnet 4 as a grader to categorize model responses according to predefined criteria.
- Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.
- The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
- Evaluation method using structured prompt to assess each AILuminate response against seven alignment criteria
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
- Prior work framework studying whether LLMs encode world models as linear structures in their representations
- Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
- Using Claude to search for features activating on specific concepts and automated labeling.