Michael Brudno Lecture Notes 20 7/24/200 GAMES ----- Games such as Chess, connect-4, etc. have one thing in common - they are complete information games. That means all of the players have all of the information they need in front of them - there are no unknown elements like what cards your opponent has or chances as to how the dice will turn out. One type of a game is a two person, 0 sum game: Zero sum reffers to the fact that the sum of the outcomes for the two players will always be equal to zero. A game which has the outcomes +1 for a win, 0 for a draw, and -1 for a loss is zero sum. We do not need to look too far for examples: chess, connect-4, poker, and backgammon are all zero sum. It is possible to have non-zero sum games. Some of these are called cooperative games, like playing the stock market: another person does not have to lose for you to be able to win. Classic example of a non-zero sum game --- Prisoner's Dilemma Scenario - Two people, we'll call them Billy and Chris, went out and robbed a bank. But since they are engineers, and not professionals like Bonnie and Clyde, they got caught and put in jail. They were both taken to separate cells and offered two choices: 1. Tell on their partner 2. Remain silent They were also separately told what would happen to them if they chose one road or another: Chris Silent | Tell --------------------------------|--------------------------------- Silent | 3 years each | Chris: 1 year, Billy: 10 years | BILLY --------------------------------|--------------------------------- Tell | Chris: 10 years Billy: 1 year | 5 years each | ------------------------------------------------------------------ It turns out that if this game was played any pre-arranged number of times than it is better to tell. However if the players are not told how many times the game will be played they may be able to find a better strategy. This example is obviously non-zero sum. There actually was a tournament of Prisoner's dilemma strategies, and the winner was a procedure which did tit-for-tat: stayed silent on the first run and afterwards just repeated whatever the opponent did in the previous move. The best studied games, however, are zero-sum complete information ones. The first paper on computer chess was published by Claude Shannon* in 1950, and in 1953 Alan Turing* published the first algorithm for the game (though he never programmed it; he did the simulations by hand). In games where there is a concept of a position, it is useful to have an evaluation function which takes the current position and returns how good it is for the player. This knowledge can then be incorporated using minimax to figure the best move. An example of an evaluation function for TIC-TAC-TOE (albeit a poor one) is 3*X_3 - 3*O_3 +2*X_2 - 2*O_2 + . . . where X_3 is the number of sequences of 3Xs on the board. For the purposes of your connect4 project, your eval function will return a 1 in case of a win, a 0 in case of a loss and in-between values for various shades of gray. Why do we need the shades of gray? We will see below. Minimax Trees ------- ----- Developed originally by John Von Neumann* they allow for simple representation of games in a computer. In most games we have some alternation of moves: first I play, then my opponent, then me again. Also usually at every point when it is my move I choose between several options, as does my opponent. Furthermore my opponent will always choose a move which is best for him (and because this is a zero-sum game worse for me), while I try to do the reverse. As a result we have trees with max and min nodes. On a max node I choose which path to follow from it, on the min nodes my opponent. I will choose the highest value among the successors, my opponent the lowest (since he wants to prevent me from reaching a good position). All evaluations take place at the leafs and are then propogated up to the root. example: Max 0.9 / \ Min 0.6 0.9 / \ / \ Max 0.7 0.5 0.9 1.0 / \ / \ / \ / \ Min .3 .7 .4 .5 .9 .8 1 0.7 In minimax we always assume our opponent will find the best move, so we don't take risks by going to a node where our opponent will lose if he plays 7/8 moves but wins if he finds the 8th one. What are the leafs? In cases of a simple game like TIC-TAC-TOE, we may be able to evaluate right down to the final position with a winner and a loser, and return 1, 0, or .5 appropriately. In chess, however, the longest possible game is about 5600 moves, and there are about 20 possible moves at every point. 20^5600 is a VERY large number, and we have no chance of evaluating all possible positions. This phenomenon is known as exponential explosion. Because you can't evaluate the whole thing, you settle for estimates: In chess you may be able to say I have a queen for his knight, a queen is worth 9 points, a knight 4, so I am 5 points better in this position. The evaluation function allows you to set the leafs at just the furthest level to which you have the time to calculate. It is also possible to use iterative deepening (more on that below). Pseudocode for minMax: If P is a leaf return eval(P) If this is a MAX let K_1, K_2,... K_n be successors of P return K_x with highest outcome else (this is a MIN) let K_1, K_2,... K_n be successors of P return K_x with lowest outcome Note that minimax is easily extendible to games of chance with concepts of position and move order, such as backgammon or poker. (The implementation is left up to the reader). AlphaBeta Pruning --------- ------- The name AlphaBeta comes from the names of the two variables usually used for the pruning. CUTOFFS A shallow cutoff is when the information to achieve a cutoff comes from the nodes parent. In the example below A is the parent of B. Max ??? / Min A / \ Max 10 B / \ 14 C Here we see that we can ignore the value of the subtree beginning with Node C. This is because B will return the maximum of 14 and C which will be greater than or equal to 14 - but this doesn't matter because Node A is a MINIMUM Node and will return the Minimum between 10 and B - which must be 10. In a deep cutoff you have to go farther to find the reason we can ignore some node. The root of the tree below is actually the great-great-grandfather of X, where the cutoff will occur. Max ??? / \ Min 10 ?? / Max F / \ Min E G / \ 5 X Here we see that we can ignore the value of X because whatever node E will return will be less than or equal to 5, but we already have a choice of 10 by following the other branch from the root node. Let us define our two bounds, alpha and beta: Alpha - is the largest value seen so far in any Max Node in the ancestry of the current node or in this node itself. Beta - The smallest value seen so far by any Min Node in the ancestry of the current node or this node itself. If we are looking at a MAX node, and the value of alpha rises to be higher then beta we can discontinue the calculation since we know that the min node which had a choice between us and the beta value will always choose the beta value. The same goes for beta: if it falls below alpha we can discontinue checking since the max node will always choose the successor with the value of alpha. The following is the pseudocode for alpha beta: ABProcedure: V(Position J, alpha, beta) 1. If J is terminal return eval(J). Else let J_1...J_n be the n successors of J. let k = 1. If J is a max node go to 2, else go to 2'. 2. set alpha = max[alpha, V(J_k, alpha, beta)]. 3. If alpha >= beta return beta 4. if k == n return alpha. Else k = k+1, goto 2. 2'. set beta = min[beta, V(J_k, alpha, beta)]. 3'. If beta <= alpha return alpha. 4'. if k == n return beta. Else k = k+1, goto 2'. How effective is alpha beta? It depends on how good you are at "guessing" good moves. If you always examine the moves from the best to the worst, the number of cutoffs will be maximized: if P is the number of leaf nodes evaluated using AlphaBeta and Q is the number evaluated by the usual minimax, P ~= sgrt(Q), a two-fold increase in depth! If you always start by looking at bad moves, AlphaBeta will run at the same rate as minimax. Because AlphaBeta runs better when you consider good moves first, very commonly in game programs authors will use iterative deepening: running AB for depth 1, remembering the evaluation results for the moves, then running at depth 2 but instead of going from left to right, going from best to worst move, then repeating at depth 3 and so on. Because Each next level requires a multiple longer time then the previous one, the final depth of your search will be almost unaffected. Transposition Tables ------------- ------ In some cases in games the number of possible moves is large, but the number of possible positions is small. This is the case in Chess endgames (when there are few pieces on the board). In these cases it very much pays to use Transposition tables. The idea behind these tables is that a position can be reached through different combination of moves, and it is wasteful to re-evaluate positions. The implementation comes straight out of 61a -- memoization. When we finish evaluating a position (leaf or internal) we can add it to a hashtable, and before we evaluate any node we check whether it is already in our hashtable. One must be careful when doing this: the whole board takes up some memory and storing it in an uncompressed format is expensive, considering the number of positions you may want to store in your table. * These are famous guys whose names you should know if you want to be a well-educated computer scientist. None of them, by the way, are best known for games.