adversarial interaction

Competitive multi-agent setting with conflicting incentives and direct opposition via bidding and bluffing.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Adversarial ablationmethod0.799
Technique used in VPD to enforce mechanistic faithfulness of parameter decompositions.
Interactionconcept0.796
Adversarial Manipulation of Truthfulnessconcept0.787
Risk that multiple truth directions enable attacks that shift outputs without triggering the primary truth direction
Adversarial Suffix Attackconcept0.779
Optimization-based jailbreak method appending strings to prompts to elicit harmful outputs.
Non Aggregative Interactionsconcept0.772
Adversarial search for causally unimportant subcomponentsmethod0.761
Procedure in VPD that actively searches for combinations that break the prediction of which subcomponents are unimportant, stress-testing the decomposition.
Geometry of Interactionconcept0.747
Dynamic model connecting logic to geometry through explicit treatment of information flow and interaction; demonstrates emergent logical complexity from simple copy-cat processes.
composition as interactionconcept0.736
Modeling function application via feedback loops between processes, ping-ponging tokens.