method
active
method:concept-activation-vectors-tcavs

Concept Activation Vectors (TCAVs)

Kim et al. 2018 method for identifying concept directions in CNN activations; precursor to LLM probing

Neighborhood — ranked by edge-count

Thinkers (1)

thinker
  • Been Kim
    introduces
    Author of TCAV (concept activation vectors); early work supporting Linear Representation Hypothesis

Methods (1)

method
  • Simple linear classifiers trained on model activations used as the probing technique within the introduced method.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Layer-40 activations with the component explained by compressed Gemini embeddings subtracted, isolating information not driven by surface text content
  • concept vectorconcept0.750
    Computed directional vector in activation space representing a specific concept, used for injection experiments
  • Procedure extracting concept vectors as difference of mean activations between concept-exemplifying and baseline/negative sentences
  • Component of NLA that maps activations to text descriptions; initialized as copy of target LLM with supervised warm-start on summarization task.
  • Random vectors require larger norm to trigger detection (8 vs 2); elicit awareness at lower rates (9/100); negated vectors comparably effective but model identification confabulated.
  • Activationsconcept0.710
    Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
  • Method for obtaining concept vectors by subtracting activations from two contrasting prompts.
  • Cumulative drift measure in internal representations across turns introduced by Das & Fioretto 2026