Common Lisp Package: MICMAC.UCT

UCT Monte Carlo tree search.

README:

FUNCTION

Public

UCT (&KEY ROOT FRESH-ROOT-STATE EXPLORATION-BIAS MAX-N-PLAYOUTS)

Starting from the ROOT node search the tree expanding it one node for each playout. Finally return the mutated ROOT. ROOT may be the root node of any tree, need not be a single node with no edges. FRESH-ROOT-STATE is a function that returns a fresh state corresponding to ROOT. This state will be destroyed unless special care is taken in STATE.

Undocumented

UNVISITED-EDGES (NODE)

VISITED-EDGES (NODE)

Private

Undocumented

EXTREMUM (SEQ PRED &KEY (KEY #'IDENTITY))

RANDOM-ELEMENT (SEQ)

GENERIC-FUNCTION

Public

MAKE-UCT-NODE (PARENT EDGE PARENT-STATE)

Create a node representing the state of that EDGE leads to from PARENT. Specialize this if you want to keep track of the state which is not done by default as it can be expensive, especially in light of TAKE-ACTION mutating it. The default implementation simply creates an instance of the class of PARENT so that one can start from a subclass of UCT-NODE and be sure that that class is going to be used for nodes below it.

OUTCOME->REWARD (NODE OUTCOME)

Compute the reward for a node in the tree from OUTCOME that is the result of a playout. This is called by the default implementation of UPDATE-UCT-STATISTICS. This is where one typically negates depending on the parity of DEPTH in two player games.

PLAY-OUT (NODE STATE REVERSE-PATH)

Play a random game from NODE with STATE and return the outcome that's fed into UPDATE-UCT-STATISTICS. The way the random game is played is referred to as `default policy' and that's what makes or breaks UCT search. This function must be customized.

SELECT-EDGE (NODE EXPLORATION-BIAS)

Choose an action to take from a state, in other words an edge to follow from NODE in the tree. The default implementation chooses randomly from the yet unvisited edges or if there is none moves down the edge with the maximum EDGE-SCORE. If you are thinking of customizing this, for example to make it choose the minimum at odd depths, the you may want to consider specializing REWARD or UPDATE-UCT-STATISTICS instead.

STATE (NODE PARENT EDGE PARENT-STATE)

Return the state that corresponds to NODE. This is not a straightforward accessor unless NODE is customized to store it. The rest of the parameters are provided so that one can reconstruct the state by taking the action of EDGE in the PARENT-STATE of PARENT. It's okay to destroy PARENT-STATE in the process as long as it's not stored elsewhere. This function must be customized.

UPDATE-UCT-STATISTICS (ROOT PATH OUTCOME)

Increment the number of visits and update the average reward in nodes and edges of PATH. By default, edges simply get their visit counter incremented while nodes also get an update to AVERAGE-REWARD based on what OUTCOME->REWARD returns.

Undocumented

EDGE-SCORE (NODE EDGE EXPLORATION-BIAS)

Private

LIST-EDGES (NODE STATE)

Return a list of edges representing the possible actions from NODE with STATE. This function must be customized.

SLOT-ACCESSOR

Public

ACTION (OBJECT)

The action represented by the edge.

AVERAGE-REWARD (OBJECT)

Average reward over random playouts started from below this node. See UPDATE-UCT-STATISTICS and REWARD.

SETFAVERAGE-REWARD (NEW-VALUE OBJECT)

Average reward over random playouts started from below this node. See UPDATE-UCT-STATISTICS and REWARD.

EDGES (OBJECT)

Outgoing edges.

SETFEDGES (NEW-VALUE OBJECT)

Outgoing edges.

FROM-NODE (OBJECT)

The node this edge starts from.

SETFFROM-NODE (NEW-VALUE OBJECT)

The node this edge starts from.

N-VISITS (OBJECT)

The number of times this action was taken from the parent state.

SETFN-VISITS (NEW-VALUE OBJECT)

The number of times this action was taken from the parent state.

TO-NODE (OBJECT)

The node this edge points to.

SETFTO-NODE (NEW-VALUE OBJECT)

The node this edge points to.

Undocumented

DEPTH (OBJECT)

CLASS

Public

UCT-EDGE

An edge in the UCT tree. Represents an action taken from a state. The value of an action is the value of its target state which is not quite as generic as it could be; feel free to specialize AVERAGE-REWARD for the edges if that's not the case.

UCT-NODE

A node in the UCT tree. Roughly translates to a state in the search space. Note that the state itself is not stored explicity, but it can be recovered by `replaying' the actions from the starting state or by customizing MAKE-UCT-NODE.