method
active
method:llm-judge-data-attribution

LLM-Judge Data Attribution

Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Using Claude Sonnet 4 as a grader to categorize model responses according to predefined criteria.
  • Baseline comparison for data attribution; outperformed by probe-based approach.
  • An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise
  • Evaluation protocol using Deepseek-V3 as external discriminator assigning 0-1 liar scores to assess open-role deception
  • High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
  • LLM Meta-Cognitionconcept0.767
    The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
  • Related capability where LLMs correct their own outputs, studied via linear representations.
  • Data Attributionconcept0.765
    The task of attributing model behaviors to specific training datapoints.