concept
active
concept:behavior-clusteringBehavior Clustering
Grouping similar model behaviors; the unsupervised method surfaces clusters of concerning patterns.
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Method that clusters behaviors without prior labels, used to surface concerning learned patterns.
- Unsupervised behavior clustering surfaces concerning learned patterns without prior labelsfinding0.793Empirical finding: unsupervised clustering reveals problematic patterns without needing labeled data.
- Unsupervised feature-finding method using cluster centroid difference as feature direction
- Organism's belief-guided action selection that instantiates generative model and maintains phenotypic states
- The approach of learning from demonstrations, often assuming a single agent; Paul Christiano used 'mimicry'.
- The path traced through output probability distribution space as interventions are applied to activations
- World-disclosing behavior that resolves uncertainty; driven by epistemic value and novelty components of expected free energy
- The behavior a model would exhibit during real-world deployment, as opposed to evaluation behavior; the target of steering.