[rec.games.misc] How do games learn?

marwk@levels.sait.edu.au (04/02/91)

> Go-moku or Runjyu
> -----------------
> Another comment:  In 1961 I wrote a playing/learning program (based on
> Samuel's Checker-learning method) for this game.  It (almost) ran on a
> computer now known as UNIVAC I but then known a THE UNIVAC.
>
> --Dick Wexelblat  (rlw@ida.org) 703 845 6601

How can one write a program that learns?  I do not see how.

One has parameters in the heuristic evaluation functions, so how can these
change based on experience?

Thanks.
Ray
--
University of South Australia   | Plus ca change, plus c'est la meme chose.
P.O. Box 1                      | Ghing thien me how, ming thien gung me how.
Ingle Farm                      | Knobs, knobs everywhere,
South Australia                 |              just vary a knob to think!

minsky@media-lab.MEDIA.MIT.EDU (Marvin Minsky) (04/03/91)

In article <16087.27f874ec@levels.sait.edu.au>
marwk@levels.sait.edu.au writes in regard to Samuel's Checker-learning method
>
>How can one write a program that learns?  I do not see how.
>One has parameters in the heuristic evaluation functions, so how can these
>change based on experience?

This is explained in the classic paper, A. L. Samuel, "Some studies in
machine learning using the game of checkers," IBM J. Res. 
Dev.,vol. 3,pp. 211-219;July,1959.  There is a summary of this in my
"Steps toward AI" paper which you can find reprinted in a book called
Computers and Thought, by Feigenbaum and Feldman.  Samuel uses several
different learning methods,  and has an important discussion of the
"credit assignment" problem.

rlw@IDA.ORG (Richard Wexelblat) (04/05/91)

In article <16087.27f874ec@levels.sait.edu.au> marwk@levels.sait.edu.au writes:
>How can one write a program that learns?  I do not see how.
>
>One has parameters in the heuristic evaluation functions, so how can these
>change based on experience?

If you define learning as "plays better over time, based on experience"
then the trivial algorithm I used was (in short form) thus.

Assume a set of numeric values that are combined in some way to
determine the value of a move.  Determine in a suitable way (see below)
that in a given situation Move B (which the program didn't make) would
have been better than Move A (which the computer DID make).  Bump the
values a delta to increase the likelihood of B being made next time an
A/B choice is encountered.

How to determine a better move?  Ask an expert... or play out all variations
of an endgame from a certain point... or play a fixed vs. a randomly
modified strategy.  There are many variants based on having a program
play against itself.

-- 
--Dick Wexelblat  (rlw@ida.org) 703 845 6601
  Can you accept an out of state sanity check?