method
active
method:feature-completeness-search-using-llm-generated-queriesFeature completeness search using LLM-generated queries
Using Claude to search for features activating on specific concepts and automated labeling.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Claude 3 Opus ratings aligned with human judgment of feature descriptions.
- Sharma et al. result supporting cross-modal alignment: language-only models implicitly encode visual structure
- Central question motivating attribute exploration.
- Motivating claim supported by the CAPTCHA example and Perez et al. (2022) findings
- Do LLMs have a unified representation of truth that spans structurally and topically diverse data?question0.744Central research question driving dataset design and experimental approach
- Establishes that the observed linear structure is not merely a representation of text probability
- Interpretive claim connecting scale to abstraction level in LLM representations
- We hypothesize that LLMs represent correctness of arithmetic expressions differently from factual statements.hypothesis0.739Core working hypothesis motivating the factual vs. arithmetic task split in the experimental design.