method
active
method:linear-artificial-tomography-latLinear Artificial Tomography (LAT)
Method for extracting deception steering vectors via PCA on contrastive activation differences; achieves 89% detection accuracy
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Zou et al.introducesIntroduced LAT for deception detection via PCA on neural activations; central method adopted by this paper
Concepts (3)
concept
- Residual StreamusesProposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
- steering vectorsintroducesA method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
- LAT methodology step constructing paired prompts that elicit divergent behaviors to extract steering vectors
Methods (1)
method
- Statistical method used to analyze neural activity data.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Used to evaluate representation quality across VTAB tasks
- Simple linear classifiers trained on model activations used as the probing technique within the introduced method.
- Method for fitting a linear classifier on collected activations to predict task-relevant features
- Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
- Merullo et al. result on cross-modal representational compatibility
- Theoretical limitation identified by the authors distinguishing reflection from stylistic tasks.
- Nguyen et al. trained linear probes on activations to distinguish evaluation from deployment scenarios.
- Semantic domain for linear transformations; denotation as actual linear function; Category instance generated from homomorphism principle.