marwk@levels.sait.edu.au (04/02/91)
> Go-moku or Runjyu > ----------------- > Another comment: In 1961 I wrote a playing/learning program (based on > Samuel's Checker-learning method) for this game. It (almost) ran on a > computer now known as UNIVAC I but then known a THE UNIVAC. > > --Dick Wexelblat (rlw@ida.org) 703 845 6601 How can one write a program that learns? I do not see how. One has parameters in the heuristic evaluation functions, so how can these change based on experience? Thanks. Ray -- University of South Australia | Plus ca change, plus c'est la meme chose. P.O. Box 1 | Ghing thien me how, ming thien gung me how. Ingle Farm | Knobs, knobs everywhere, South Australia | just vary a knob to think!
minsky@media-lab.MEDIA.MIT.EDU (Marvin Minsky) (04/03/91)
In article <16087.27f874ec@levels.sait.edu.au> marwk@levels.sait.edu.au writes in regard to Samuel's Checker-learning method > >How can one write a program that learns? I do not see how. >One has parameters in the heuristic evaluation functions, so how can these >change based on experience? This is explained in the classic paper, A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM J. Res. Dev.,vol. 3,pp. 211-219;July,1959. There is a summary of this in my "Steps toward AI" paper which you can find reprinted in a book called Computers and Thought, by Feigenbaum and Feldman. Samuel uses several different learning methods, and has an important discussion of the "credit assignment" problem.
markh@csd4.csd.uwm.edu (Mark William Hopkins) (04/04/91)
In article <16087.27f874ec@levels.sait.edu.au> marwk@levels.sait.edu.au writes: >How can one write a program that learns? I do not see how. > >One has parameters in the heuristic evaluation functions, so how can these >change based on experience? ("Least Squares Method") By minimizing the square of an error function using any of a standard set of minimization routines covered in numerical techniques texts... It's up to you to determine how to form the error function. The most natural way is to use a search technique to find an optimum value for some evaluation function looking several moves ahead, and use that as the basis for determining corrections to be made on the evaluation function applied to the current board. You need boundary conditions. Evaluation on a terminal board is done without search and error-correction to a WIN, LOSE (or DRAW). Reinforcement learning uses more complicated techniques to accomplish the the same goal of training a predictor. Also, boundary conditions don't actually prove necessary all the time. It is sometimes possible to train the program to learn what constitutes a WIN, LOSS or DRAW board from zero-knowledge. I don't think the Samuel's checker playing program used any boundary conditions...
rlw@IDA.ORG (Richard Wexelblat) (04/05/91)
In article <16087.27f874ec@levels.sait.edu.au> marwk@levels.sait.edu.au writes: >How can one write a program that learns? I do not see how. > >One has parameters in the heuristic evaluation functions, so how can these >change based on experience? If you define learning as "plays better over time, based on experience" then the trivial algorithm I used was (in short form) thus. Assume a set of numeric values that are combined in some way to determine the value of a move. Determine in a suitable way (see below) that in a given situation Move B (which the program didn't make) would have been better than Move A (which the computer DID make). Bump the values a delta to increase the likelihood of B being made next time an A/B choice is encountered. How to determine a better move? Ask an expert... or play out all variations of an endgame from a certain point... or play a fixed vs. a randomly modified strategy. There are many variants based on having a program play against itself. -- --Dick Wexelblat (rlw@ida.org) 703 845 6601 Can you accept an out of state sanity check?