method
active
method:concept-activation-vectors-tcavsConcept Activation Vectors (TCAVs)
Kim et al. 2018 method for identifying concept directions in CNN activations; precursor to LLM probing
Neighborhood — ranked by edge-count
Thinkers (1)
thinker
- Been KimintroducesAuthor of TCAV (concept activation vectors); early work supporting Linear Representation Hypothesis
Methods (1)
method
- Linear ProbeextendsSimple linear classifiers trained on model activations used as the probing technique within the introduced method.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Layer-40 activations with the component explained by compressed Gemini embeddings subtracted, isolating information not driven by surface text content
- Computed directional vector in activation space representing a specific concept, used for injection experiments
- Procedure extracting concept vectors as difference of mean activations between concept-exemplifying and baseline/negative sentences
- Component of NLA that maps activations to text descriptions; initialized as copy of target LLM with supervised warm-start on summarization task.
- Random vectors require larger norm to trigger detection (8 vs 2); elicit awareness at lower rates (9/100); negated vectors comparably effective but model identification confabulated.
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Method for obtaining concept vectors by subtracting activations from two contrasting prompts.
- Cumulative drift measure in internal representations across turns introduced by Das & Fioretto 2026