Michael Brudno    	Lecture Notes 20		    7/24/200 

GAMES 
-----

Games such as Chess, connect-4, etc. have one thing in common -
they are complete information games. That means all of the players have
all of the information they need in front of them - there are no
unknown elements like what cards your opponent has or chances as to
how the dice will turn out.

One type of a game is a two person, 0 sum game: Zero sum reffers to the
fact that the sum of the outcomes for the two players will always be
equal to zero. A game which has the outcomes +1 for a win, 0 for a
draw, and -1 for a loss is zero sum. We do not need to look too far
for examples: chess, connect-4, poker, and backgammon are all zero sum.

It is possible to have non-zero sum games. Some of these are called
cooperative games, like playing the stock market: another person does
not have to lose for you to be able to win.

Classic example of a non-zero sum game --- Prisoner's Dilemma 

Scenario - Two people, we'll call them Billy and Chris, went out and
robbed a bank.  But since they are engineers, and not professionals
like Bonnie and Clyde, they got caught and put in jail.

They were both taken to separate cells and offered two choices: 
1. Tell on their partner
2. Remain silent

They were also separately told what would happen to them if they chose
one road or another:

                                          Chris 
                                Silent      |     Tell 
            --------------------------------|--------------------------------- 
   Silent   |   3 years each                | Chris: 1 year, Billy: 10 years |
BILLY       --------------------------------|--------------------------------- 
   Tell     | Chris: 10 years Billy: 1 year | 5 years each                   | 
            ------------------------------------------------------------------ 

It turns out that if this game was played any pre-arranged number of
times than it is better to tell. However if the players are not told
how many times the game will be played they may be able to find a
better strategy.

This example is obviously non-zero sum. There actually was a tournament of
Prisoner's dilemma strategies, and the winner was a procedure which did 
tit-for-tat: stayed silent on the first run and afterwards just repeated 
whatever the opponent did in the previous move.

The best studied games, however, are zero-sum complete information
ones.  The first paper on computer chess was published by Claude
Shannon* in 1950, and in 1953 Alan Turing* published the first algorithm
for the game (though he never programmed it; he did the simulations by
hand). 

In games where there is a concept of a position, it is useful to have
an evaluation function which takes the current position and returns
how good it is for the player. This knowledge can then be incorporated
using minimax to figure the best move. An example of an evaluation
function for TIC-TAC-TOE (albeit a poor one) is 3*X_3 - 3*O_3
+2*X_2 - 2*O_2 + . . . where X_3 is the number of sequences of
3Xs on the board. For the purposes of your connect4 project, your eval 
function will return a 1 in case of a win, a 0 in case of a loss
and in-between values for various shades of gray. Why do we need the
shades of gray? We will see below.

Minimax Trees 
------- -----

Developed originally by John Von Neumann* they allow for simple
representation of games in a computer. In most games we have some
alternation of moves: first I play, then my opponent, then me
again. Also usually at every point when it is my move I choose between
several options, as does my opponent. Furthermore my opponent will
always choose a move which is best for him (and because this is a
zero-sum game worse for me), while I try to do the reverse. As a
result we have trees with max and min nodes. On a max node I choose
which path to follow from it, on the min nodes my opponent. I will
choose the highest value among the successors, my opponent the lowest
(since he wants to prevent me from reaching a good position). All
evaluations take place at the leafs and are then propogated up to the
root.

example: 
                 Max                    0.9 
                                      /     \ 
                 Min                0.6      0.9 
                                   /  \      /  \ 
                 Max             0.7  0.5  0.9   1.0 
                                 / \  / \   / \   / \ 
                 Min           .3 .7 .4 .5 .9 .8 1  0.7 

In minimax we always assume our opponent will find the best move, so
we don't take risks by going to a node where our opponent will lose if
he plays 7/8 moves but wins if he finds the 8th one.

What are the leafs? In cases of a simple game like TIC-TAC-TOE, we may
be able to evaluate right down to the final position with a winner and
a loser, and return 1, 0, or .5 appropriately. In chess, however, the
longest possible game is about 5600 moves, and there are about 20
possible moves at every point. 20^5600 is a VERY large number, and we
have no chance of evaluating all possible positions. This phenomenon
is known as exponential explosion. Because you can't evaluate the
whole thing, you settle for estimates: In chess you may be able to say
I have a queen for his knight, a queen is worth 9 points, a knight 4,
so I am 5 points better in this position. The evaluation function
allows you to set the leafs at just the furthest level to which you
have the time to calculate. It is also possible to use iterative
deepening (more on that below).

Pseudocode for minMax: 

If P is a leaf 
    return eval(P) 
If this is a MAX 
    let K_1, K_2,... K_n be successors of P 
    return K_x with highest outcome 
else (this is a MIN) 
    let K_1, K_2,... K_n be successors of P 
    return K_x with lowest outcome 

Note that minimax is easily extendible to games of chance with
concepts of position and move order, such as backgammon or poker. (The
implementation is left up to the reader).


AlphaBeta Pruning 
--------- -------

The name AlphaBeta comes from the names of the two variables usually
used for the pruning.

CUTOFFS 

A shallow cutoff is when the information to achieve a cutoff comes
from the nodes parent. In the example below A is the parent of B.

Max                              ??? 
                                 / 
Min                             A 
                               /  \ 
Max                           10   B 
                                  /  \ 
                                 14   C 

Here we see that we can ignore the value of the subtree beginning with
Node C. This is because B will return the maximum of 14 and C which
will be greater than or equal to 14 - but this doesn't matter because
Node A is a MINIMUM Node and will return the Minimum between 10 and B
- which must be 10.

In a deep cutoff you have to go farther to find the reason we can
ignore some node. The root of the tree below is actually the
great-great-grandfather of X, where the cutoff will occur.

Max                ??? 
                  /   \ 
Min             10     ?? 
                       / 
Max                    F 
                      /  \ 
Min                  E    G 
                   /  \ 
                  5    X

Here we see that we can ignore the value of X because whatever node
E will return will be less than or equal to 5, but we already have a
choice of 10 by following the other branch from the root node.

Let us define our two bounds, alpha and beta: 

Alpha - is the largest value seen so far in any Max Node in the
ancestry of the current node or in this node itself. 
Beta - The smallest value seen so far by any Min Node in the ancestry 
of the current node or this node itself.

If we are looking at a MAX node, and the value of alpha rises to be
higher then beta we can discontinue the calculation since we know that
the min node which had a choice between us and the beta value will
always choose the beta value. The same goes for beta: if it falls
below alpha we can discontinue checking since the max node will always
choose the successor with the value of alpha.

The following is the pseudocode for alpha beta: 

ABProcedure: V(Position J, alpha, beta) 
1. If J is terminal return eval(J). Else let J_1...J_n be the n 
successors of J. let k = 1. If J is a max node go to 2, else go to 2'. 

2. set alpha = max[alpha, V(J_k, alpha, beta)]. 
3. If alpha >= beta return beta 
4. if k == n return alpha. Else k = k+1, goto 2. 

2'. set beta = min[beta, V(J_k, alpha, beta)]. 
3'. If beta <= alpha return alpha. 
4'. if k == n return beta. Else k = k+1, goto 2'. 
  

How effective is alpha beta? It depends on how good you are at "guessing"
good moves. If you always examine the moves from the best to the worst,
the number of cutoffs  will be maximized: if P is the number of leaf nodes 
evaluated using AlphaBeta and Q is the number evaluated by the usual minimax, 
P ~= sgrt(Q), a two-fold increase in depth! If you always start by looking 
at bad moves, AlphaBeta will run at the same rate as minimax. 

Because AlphaBeta runs better when you consider good moves first, very
commonly in game programs authors will use iterative deepening: running
AB for depth 1, remembering the evaluation results for the moves, then
running at depth 2 but instead of going from left to right, going from 
best to worst move, then repeating at depth 3 and so on. Because Each next
level requires a multiple longer time then the previous one, the final 
depth of your search will be almost unaffected.

Transposition Tables
------------- ------

In some cases in games the number of possible moves is large, but the
number of possible positions is small. This is the case in Chess
endgames (when there are few pieces on the board). In these cases it
very much pays to use Transposition tables. The idea behind these
tables is that a position can be reached through different combination
of moves, and it is wasteful to re-evaluate positions. The
implementation comes straight out of 61a -- memoization. When we
finish evaluating a position (leaf or internal) we can add it to a
hashtable, and before we evaluate any node we check whether it is
already in our hashtable. One must be careful when doing this: the
whole board takes up some memory and storing it in an uncompressed
format is expensive, considering the number of positions you may want
to store in your table.
  
  
* These are famous guys whose names you should know if you want to be a 
well-educated computer scientist. None of them, by the way, are best
known for games.