# FUNCTION

# Public

# UCT (&KEY ROOT FRESH-ROOT-STATE EXPLORATION-BIAS MAX-N-PLAYOUTS)

Starting from the ROOT node search the tree expanding it one node
for each playout. Finally return the mutated ROOT. ROOT may be the
root node of any tree, need not be a single node with no edges.
FRESH-ROOT-STATE is a function that returns a fresh state
corresponding to ROOT. This state will be destroyed unless special
care is taken in STATE.

# Undocumented

# UNVISITED-EDGES (NODE)

# VISITED-EDGES (NODE)

# Private

# Undocumented

# EXTREMUM (SEQ PRED &KEY (KEY #'IDENTITY))

# RANDOM-ELEMENT (SEQ)

# GENERIC-FUNCTION

# Public

# MAKE-UCT-NODE (PARENT EDGE PARENT-STATE)

Create a node representing the state of that EDGE
leads to from PARENT. Specialize this if you want to keep track of the
state which is not done by default as it can be expensive, especially
in light of TAKE-ACTION mutating it. The default implementation simply
creates an instance of the class of PARENT so that one can start from
a subclass of UCT-NODE and be sure that that class is going to be used
for nodes below it.

# OUTCOME->REWARD (NODE OUTCOME)

Compute the reward for a node in the tree from
OUTCOME that is the result of a playout. This is called by the default
implementation of UPDATE-UCT-STATISTICS. This is where one typically
negates depending on the parity of DEPTH in two player games.

# PLAY-OUT (NODE STATE REVERSE-PATH)

Play a random game from NODE with STATE and return
the outcome that's fed into UPDATE-UCT-STATISTICS. The way the random
game is played is referred to as `default policy' and that's what
makes or breaks UCT search. This function must be customized.

# SELECT-EDGE (NODE EXPLORATION-BIAS)

Choose an action to take from a state, in other
words an edge to follow from NODE in the tree. The default
implementation chooses randomly from the yet unvisited edges or if
there is none moves down the edge with the maximum EDGE-SCORE. If you
are thinking of customizing this, for example to make it choose the
minimum at odd depths, the you may want to consider specializing
REWARD or UPDATE-UCT-STATISTICS instead.

# STATE (NODE PARENT EDGE PARENT-STATE)

Return the state that corresponds to NODE. This is
not a straightforward accessor unless NODE is customized to store it.
The rest of the parameters are provided so that one can reconstruct
the state by taking the action of EDGE in the PARENT-STATE of PARENT.
It's okay to destroy PARENT-STATE in the process as long as it's not
stored elsewhere. This function must be customized.

# UPDATE-UCT-STATISTICS (ROOT PATH OUTCOME)

Increment the number of visits and update the
average reward in nodes and edges of PATH. By default, edges simply
get their visit counter incremented while nodes also get an update to
AVERAGE-REWARD based on what OUTCOME->REWARD returns.

# Undocumented

# EDGE-SCORE (NODE EDGE EXPLORATION-BIAS)

# Private

# LIST-EDGES (NODE STATE)

Return a list of edges representing the possible
actions from NODE with STATE. This function must be customized.

# SLOT-ACCESSOR

# Public

# ACTION (OBJECT)

The action represented by the edge.

# AVERAGE-REWARD (OBJECT)

Average reward over random playouts started from
below this node. See UPDATE-UCT-STATISTICS and REWARD.

# SETFAVERAGE-REWARD (NEW-VALUE OBJECT)

Average reward over random playouts started from
below this node. See UPDATE-UCT-STATISTICS and REWARD.

# EDGES (OBJECT)

Outgoing edges.

# SETFEDGES (NEW-VALUE OBJECT)

Outgoing edges.

# FROM-NODE (OBJECT)

The node this edge starts from.

# SETFFROM-NODE (NEW-VALUE OBJECT)

The node this edge starts from.

# N-VISITS (OBJECT)

The number of times this action was taken from the
parent state.

# SETFN-VISITS (NEW-VALUE OBJECT)

The number of times this action was taken from the
parent state.

# TO-NODE (OBJECT)

The node this edge points to.

# SETFTO-NODE (NEW-VALUE OBJECT)

The node this edge points to.

# Undocumented

# DEPTH (OBJECT)

# CLASS

# Public

# UCT-EDGE

An edge in the UCT tree. Represents an action taken
from a state. The value of an action is the value of its target state
which is not quite as generic as it could be; feel free to specialize
AVERAGE-REWARD for the edges if that's not the case.

# UCT-NODE

A node in the UCT tree. Roughly translates to a
state in the search space. Note that the state itself is not stored
explicity, but it can be recovered by `replaying' the actions from the
starting state or by customizing MAKE-UCT-NODE.