Linear Artificial Tomography (LAT)

Method for extracting deception steering vectors via PCA on contrastive activation differences; achieves 89% detection accuracy

Neighborhood — ranked by edge-count

paper

thinker

Zou et al.
introduces
Introduced LAT for deception detection via PCA on neural activations; central method adopted by this paper

concept

Residual Stream
uses
Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
steering vectors
introduces
A method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
Contrastive Stimulus Design
uses
LAT methodology step constructing paired prompts that elicit divergent behaviors to extract steering vectors

method

Principal components analysis (PCA)
uses
Statistical method used to analyze neural activity data.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Linear Probingmethod0.738
Used to evaluate representation quality across VTAB tasks
Linear Probemethod0.722
Simple linear classifiers trained on model activations used as the probing technique within the introduced method.
Linear Probe Trainingmethod0.719
Method for fitting a linear classifier on collected activations to predict task-relevant features
Linear Decodingmethod0.706
Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
A single linear projection is sufficient to stitch a vision model to an LLM and achieve good performance on visual question answering and image captioningfinding0.705
Merullo et al. result on cross-modal representational compatibility
Accuracy does not vary linearly with latent reflection directions; instead it follows a more non-linear mapping that requires deeper theoretical treatment.claim0.703
Theoretical limitation identified by the authors distinguishing reflection from stylistic tasks.
Linear Probe for Evaluation Awarenessmethod0.702
Nguyen et al. trained linear probes on activations to distinguish evaluation from deployment scenarios.
Linear Map (a ⊸ b)framework0.696
Semantic domain for linear transformations; denotation as actual linear function; Category instance generated from homomorphism principle.