method
active
method:greedy-policyε-greedy Policy
Exploration-exploitation policy used in combination with Q-learning
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A heuristic exploration strategy that selects a random action with probability epsilon, otherwise acts greedily.
- Sequence of actions considered by the agent; basis for planning.
- Baseline self-report method selecting highest-probability token; shown to collapse to few uninformative values
- Choosing sequences of actions based on expected free energy; prior probability of policy is softmax of expected free energy
- The active sampling of observations to maximize information gain and resolve uncertainty about the environment.
- Distinguished value initially associated with every key combination in associative memory m; propagates through operations to signal missing values; enables termination of recursive delegation.
- deterministic code agent that models resource economy, tracking money flows and exploiting cash-poor opponents
- Definition of sequential policy in active inference, contrasting with state-action policies in RL.