marwk@levels.sait.edu.au (04/02/91)
> Go-moku or Runjyu > ----------------- > Another comment: In 1961 I wrote a playing/learning program (based on > Samuel's Checker-learning method) for this game. It (almost) ran on a > computer now known as UNIVAC I but then known a THE UNIVAC. > > --Dick Wexelblat (rlw@ida.org) 703 845 6601 How can one write a program that learns? I do not see how. One has parameters in the heuristic evaluation functions, so how can these change based on experience? Thanks. Ray -- University of South Australia | Plus ca change, plus c'est la meme chose. P.O. Box 1 | Ghing thien me how, ming thien gung me how. Ingle Farm | Knobs, knobs everywhere, South Australia | just vary a knob to think!
minsky@media-lab.MEDIA.MIT.EDU (Marvin Minsky) (04/03/91)
In article <16087.27f874ec@levels.sait.edu.au> marwk@levels.sait.edu.au writes in regard to Samuel's Checker-learning method > >How can one write a program that learns? I do not see how. >One has parameters in the heuristic evaluation functions, so how can these >change based on experience? This is explained in the classic paper, A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM J. Res. Dev.,vol. 3,pp. 211-219;July,1959. There is a summary of this in my "Steps toward AI" paper which you can find reprinted in a book called Computers and Thought, by Feigenbaum and Feldman. Samuel uses several different learning methods, and has an important discussion of the "credit assignment" problem.
rlw@IDA.ORG (Richard Wexelblat) (04/05/91)
In article <16087.27f874ec@levels.sait.edu.au> marwk@levels.sait.edu.au writes: >How can one write a program that learns? I do not see how. > >One has parameters in the heuristic evaluation functions, so how can these >change based on experience? If you define learning as "plays better over time, based on experience" then the trivial algorithm I used was (in short form) thus. Assume a set of numeric values that are combined in some way to determine the value of a move. Determine in a suitable way (see below) that in a given situation Move B (which the program didn't make) would have been better than Move A (which the computer DID make). Bump the values a delta to increase the likelihood of B being made next time an A/B choice is encountered. How to determine a better move? Ask an expert... or play out all variations of an endgame from a certain point... or play a fixed vs. a randomly modified strategy. There are many variants based on having a program play against itself. -- --Dick Wexelblat (rlw@ida.org) 703 845 6601 Can you accept an out of state sanity check?